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Generic properties of subgroups of free groups and finite 

presentations 

Frederique Bassino, Cyril Nicaud, and Pascal Weil 

Abstract. Asymptotic properties of finitely generated subgroups of free groups, 
and of finite group presentations, can be considered in several fashions, depend¬ 
ing on the way these objects are represented and on the distribution assumed 
on these representations: here we assume that they are represented by tuples 
of reduced words (generators of a subgroup) or of cyclically reduced words 
(relators). Classical models consider fixed size tuples of words (e.g. the few- 
generator model) or exponential size tuples (e.g. Gromov’s density model), and 
they usually consider that equal length words are equally likely. We general¬ 
ize both the few-generator and the density models with probabilistic schemes 
that also allow variability in the size of tuples and non-uniform distributions 
on words of a given length. 

Our first results rely on a relatively mild prefix-heaviness hypothesis on 
the distributions, which states essentially that the probability of a word de¬ 
creases exponentially fast as its length grows. Under this hypothesis, we gen¬ 
eralize several classical results: exponentially generically a randomly chosen 
tuple is a basis of the subgroup it generates, this subgroup is malnormal and 
the tuple satisfies a small cancellation property, even for exponential size tu¬ 
ples. In the special case of the uniform distribution on words of a given length, 
we give a phase transition theorem for the central tree property, a combina¬ 
torial property closely linked to the fact that a tuple freely generates a sub¬ 
group. We then further refine our results when the distribution is specified 
by a Markovian scheme, and in particular we give a phase transition theorem 
which generalizes the classical results on the densities up to which a tuple of 
cyclically reduced words chosen uniformly at random exponentially generically 
satisfies a small cancellation property, and beyond which it presents a trivial 
group. 


This paper is part of the growing body of literature on asymptotic properties of 
subgroups of free groups and of finite group presentations, which goes back at least 
to the work of Gromov m and Arzhantseva and Ol’shanskii [1]. As in much of the 
recent literature, the accent is on so-called generic properties, that is, properties 
whose probability tends to 1 when the size of instances grows to infinity. A theory 
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of genericity and its applications to complexity theory was initiated by Kapovich, 
Myasnikov, Schupp and Shpilrain mi, and developed in a number of papers, see 
Kapovich for a recent discussion m 

Genericity, and more generally asymptotic properties, depends on the fashion 
in which input is represented: finitely presented groups are usually given by finite 
presentations, i.e. tuples of cyclically reduced words; finitely generated subgroups 
of free groups can be represented by tuples of words (generators) or Stallings graphs. 
The representation by Stallings graphs is investigated by the authors, along with 
Martino and Ventura in laisiii] but we will not discuss it in this paper: we are 
dealing, like most of the literature, with tuples of words. 

There are, classically, two main models (see Section 2.21: the few words model, 
where an integer k is fixed and one considers fc-tuples of words of length at most n, 
when n tends to infinity, see e.g. and the density model, where we con¬ 

sider tuples of cyclically reduced words of length n, whose size grows exponentially 
with n, see e.g. [13 m mils]. 

Typi cal prop erties investigated include the following (see in particular Sec¬ 
tions [T^ and [o I: whether a random tuple h freely generates the subgroup H = (h) 
mm , whether H is malnormal mm or Whitehead minimal mm, whether 
the finite presentation with relators h has a small cancellation property, or whether 
the group it presents is infinite or trivial |23j . 

All these models implicitly assume the uniform distribution on the set of re¬ 
duced words of equal length (Ollivier also considers non-uniform distributions in 


.)■ 

We introduce (Section]^ a model for probability distributions on tuples of re¬ 
duced words that is sufficiently general to extend the few words model and Gromov’s 
density model mentioned above, and to leave space for non uniform distributions. 
Like these two models, ours assumes that a tuple h of words is generated by in¬ 
dependently drawing words of given lengths, but it also handles independently the 
size of h and the lengths of the words in h. 

Our first set of results assumes a prefix-heaviness hypothesis on the probability 
distribution on words: the probability of drawing a word decreases exponentially 
fast as its length grows (precise definitions are given in Section]^. It is a natural 
hypothesis if we imagine that our probabilistic source generates words one letter at 
a time, from left to right. This relatively mild hypothesis suffices to obtain general 
results on the exponential genericity of a certain geometric property of the Stallings 
graph of the subgroup H generated by a randomly chosen tuple h (the eentral tree 
property, implicitly considered in mm and explicitly in (3), of the fact that h 
freely generates H, and of the malnormality of H, see Section [TSl 

In Section [TGI we apply these general results to the uniform distribution and 
generalize known results in two directions. Firstly we consider random exponen¬ 
tial size tuples, for which we give a phase transition theorem for the central tree 
property: it holds exponentially generically up to density |, and fails exponentially 
generically at densities greater than j (Proposition 3.21). In particular, a random 
tuple is exponentially generically a basis of the subgroup it generates up to density 
|, but we cannot say anything of that property at higher densities. 

We also extend Jitsukawa’s result on malnormality [12| . from fixed size to expo¬ 
nential size tuples under uniform distribution up to density ^ (Proposition 3.22). 
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In view of the methods used to establish this result, it is likely that the value is 
not optimal. 

Secondly, we show that the height of the central tree of a random fixed size 
tupe /i, which measures the amount of initial cancellation between the elements of h 
and h~^, is generically less than any prescribed unbounded non-decreasing function 
(Proposition 3.241. Earlier results only showed that this height was exponentially 
generically bounded by any linear function. 

We then introduce Markovian automata, a probabilistic automata-theoretic 
model, to define explicit instances of prefix-heavy distributions (Section |^. Ad¬ 
ditional assumptions like irreducibility or ergodicity lead to the computation of 
precise bounds for the parameters of prefix-heaviness. In particular, we prove a 
phase transition theorem for ergodic Markovian automata (Section |4.4[ ), showing 
that small cancellation properties generically hold up to a certain density, and 
generically do not hold at higher densities. More precisely, if apj is the coincidence 
probability of the Markovian automaton, Property C'(A) holds exponentially gener¬ 
ically at Q;[ 2 ]-density less than ^ (that is: for random tuples of size for some 

d < -1), and fails exponentially generically at Q;[ 2 ]-densities greater than |. We 
also show that at a[ 2 ]-densities greater than i, a random tuple of cyclically reduced 
words generically presents a degenerate group (see Proposition 4.23 for a precise 
definition). These results generalize the classical results on uniform distribution in 
Ollivier |23L I24| . It remains to be seen whether our methods can be applied to 
fill the gap, say, between a[ 2 ]-density ^ and where small cancellation property 
C"(g) generically does not hold yet the presented group might be hyperbolic, see 


Some of the definitions in this paper, notably that of Markovian automata, were 
introduced by the authors in [ 2 ], and some of the results were announced there as 
well. The results in the present paper are more precise, and subsume those of [2]. 


1. Free groups, subgroups and presentations 

In this section, we set the notation and basic definitions of the properties of 
subgroups of free groups and finite presentations which we will consider. 

1.1. Free groups and reduced words. Let A be a finite non-empty set, 
which will remain fixed throughout the paper, with |A| = r, and let A be the 
symmetrized alphabet, namely the disjoint union of A and a set of formal inverses 
A~^ = {a“^GA|a€ A}. By convention, the formal inverse operation is extended 
to A by letting = a for each a G A. A word in A* (that is: a word written 

on the alphabet A) is reduced if it does not contain length 2 factors of the form 
aa“^ {a G A). If a word is not reduced, one can reduce it by iteratively deleting 
every factor of the form aa~^. The resulting reduced word is uniquely determined: 
it does not depend on the order of the cancellations. For instance, u = aabb~^a~^ 
reduces to aaa~^, and thence to a. 

The set F of reduced words is naturally equipped with a group structure, where 
the product u • u is the (reduced) word obtained by reducing the concatenation uv. 
This group is called the free group on A. More generally, every group isomorphic 
to F, say, G = ip{F) where ip is an isomorphism, is said to be a free group, freely 
generated by ip{A). The set p{A) is called a basis of G. Note that if r > 2, then F 
has infinitely many bases: if, for instance, a b are elements of A, then replacing 
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a by b^ab'^ (for some integers n,m) yields a basis. The rank of F (or of any 
isomorphic free group) is the cardinality |^| of A, and one shows that this notion 
is well-defined in the following sense: every basis of F has the same cardinality. 

Let X, y be elements of a group G. We say that y is a conjugate of x if there 
exists an element g G G such that y = g~^xg, which we write y = . The notation 

is extended to subsets of G: if H C G, then iL® = {x® \ x G H}. Conjugacy 
of elements of the free group F is characterized as follows. Say that a word u 
is cyclically reduced word if it is non-empty, reduced and its first and last letters 
are not mutually inverse (or equivalently, if is non-empty and reduced). For 
instance, ab~^a~^bbb is cyclically reduced, but ab~^a~^bba~^ is not. 

For every reduced word u, let k(u) denote its cyclic reduction, which is the short¬ 
est word V such that u = wvw~^ for some word w. For instance, K{ab~^a~^bba~^) = 
a~^b. It is easily verified that two reduced words u and v are conjugates if and only 
if k{u) and k{v) are cyclic conjugates (that is: there exist words x and y such that 
k{u) = xy and k,{v) = yx). 

Let TZn (resp. C„) denote the set of all reduced (resp. cyclically reduced) words 
of length n > 1, and let TZ = Un>i ^ = Un>i be the set of all reduced 

words, and all cyclically reduced words, respectively. 

Every word of length 1 is cyclically reduced, so \'lZi\ = \Ci\ = 2r. A reduced 
word of length n > 2 is of the form ua, where u is reduced and a is not the inverse of 
the last letter of u. An easy induction shows that there are |7?.„| = 2r(2r — 1)"“^ = 
2 ^^( 2 r — 1)” reduced words of length n>2. 

Similarly, if n > 2, then is the set of words of the form ua, where w is a 
reduced word and a G Ais neither the inverse of the first letter of u, nor the inverse 
of its last letter: for a given u, there are either 2r — 1 or 2r —2 such words, depending 
whether the first and last letter of u are equal. In particular, the number of words 
in Cn satisfies — l)”“^(2r — 2) < |C„| < 2 yrT( 2 ^ — 1)”, and in particular, 

|C„|=0((2r-lV). 


1.2. Subgroups and presentations. Given a tuple h = {hi ,..., h^) of ele¬ 
ments of F, let = {hi,hi^,..., hk, h~jj^) and let (h) denote the subgroup of F 
generated by the elements of h, that is, the set of all the elements of F which can 


be written as a product of elements of h^. It is a classical result of Nielsen that 
every such subgroup is free |22j . 

An important property of subgroups is malnormality, which is related to geo¬ 
metric considerations {e.g. lain]): a subgroup iL of a group G is malnormal if 
FI n is trivial for every x ^ H. It is decid able whether a finitely generated 
subgroup {h) is malnormal 1 [12L[15] . see Section 1.3), whereas malnormality is not 
decidable in general hyperbolic groups [6]. 

A tuple h of elements of F{A) can also be considered as a set of relators in a 


group presentation. More precisely, we denote by {A \ h) the group with generator 
set A and relators the elements of h, namely the quotient of F{A) by the normal 
subgroup generated by h. It is customary to consider such a group presentation 
only when h consists only of cyclically reduced words, since {A\h) = {A\ K{h)). 

The small cancellation property is a combinatorial property of a group presen¬ 
tation, with far-reaching consequences on the quotient group. Let h be a tuple of 
cyclically reduced words. A piece in h is a word u with at least two occurrences as 
a prefix of a cyclic conjugate of a word in . Let 0 < A < 1. The tuple h (or the 
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group presentation (A | ft,)) has the small cancellation property C'{X) if whenever a 
piece u occurs as a prefix of a cyclic conjugate w of a word in ft^, then |u| < A|ui|. 

The following properties are well-known. We do not give the definition of the 
group-theoretic properties in this statement and refer the reader to m or to the 
comprehensive survey |24) . 

Proposition 1.1. If h is a tuple of cyclically reduced words satisfying 
then G = {A \ h) is infinite, torsion-free and word-hyperbolic. In addition, it has 
solvable word problem (by Dehn’s algorithm) and solvable conjugacy problem. 

Moreover, if ft and g both have property C"(|) and if they present the same 
group, then h^ = g^ up to the order of the elements in the tuples. 

1.3. Graphical representation of subgroups and the central tree prop¬ 
erty. A privileged tool for the study of subgroups of free groups is provided by 
Stallings graphs: if ftft is a finitely generated subgroup of F, its Stallings graph 
r(ift) is a finite graph of a particular type, uniquely representing H, whose com¬ 
putation was first made explicit by Stallings |31j . The mathematical object itself 
is already described by Serre |29j . The description we give below differs slightly 
from Serre’s and Stallings’, it follows |35L I15L1331121L I30| and it emphasizes the 
combinatorial, graph-theoretical aspect, which is more conducive to the discussion 
of algorithmic properties. 

A finite A-graph is a pair T = {V, E) with V finite and E C V x A x V, such 
that if both (u,a,v) and {u,a,v') are in E then v = v', and if both (u,a,v) and 
{u',a,v) are in E then u = u'. Let v £V. The pair (T, u) is said to be admissible 
if the underlying graph of T is connected (that is: the undirected graph obtained 
from r by forgetting the letter labels and the orientation of edges), and if every 
vertex w gV, except possibly v, occurs in at least two edges in E. 

Every admissible pair (T, 1) represents a unique subgroup H of F{A) in the 
following sense: if u is a reduced word, then u G H ii and only if u labels a loop 
at 1 in r (by convention, an edge (u, a, v) can be read from u to u with label a, 
or from v to u with label a“^). One can show that H is hnitely generated. More 
precisely, the following procedure yields a basis of H: choose a spanning tree T of 
T; for each edge e = {u, a, v) of T not in T, let fte = XuUxf^, where Xu (resp. Xy) is 
the only reduced word labeling a path in T from 1 to u (resp. u); then the by freely 
generate H and as a result, the rank of H is exactly |if| — |y| -|-1. 

Conversely, if ft = (fti,..., h^) is a tuple of reduced words, then the subgroup 
H = (ft) admits a Stallings graph, written (r(ift),l), which can be computed 
effectively and efficiently. A quick description of the algorithm is as follows. We 
first build a graph with edges labeled by letters in A, and then reduce it to an 
A-graph using foldings. First build a vertex 1. Then, for every 1 < z < fc, build 
a loop with label ft^ from 1 to 1, adding \hi \ — 1 new vertices. Change every edge 
{u,a~^,v) labeled by a letter of A~^ into an edge {v,a,u). At this point, we have 
constructed the so-called bouquet of loops labeled by the hi. 

Then iteratively identify the vertices v and w whenever there exists a vertex 
u and a letter a G A such that either both (u,a,v) and {u,a,w) or both {v,a,u) 
and {w, a, u) are edges in the graph (the corresponding two edges are folded, in 
Stallings’ terminology). 
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The resulting graph T is such that (T, 1) is admissible, the reduced words 
labeling a loop at 1 are exactly the elements of H and, very much like in the (1- 
dimensional) reduction of words, that graph does not depend on the order used to 
perform the foldings. The graph (r(i?), 1) can be computed in time almost linear 
(precisely: in time 0{n log* n) |33)b 

Some algebraic properties of H can be directly seen on its Stallings graph 
For instance, one can show that H is malnormal if and only if there 
exists no non-empty reduced word u which labels a loop in two distinct vertices of 
T{H) |12l 115) . This property leads to an easy decision procedure of malnormality 
for subgroups of a free group. We refer the reader to [STl [35l fim [2T] for more 
information about Stallings graphs. 

If is a tuple of elements of F, let min(/i) be the minimum length of an element 
of h and let IcplW be the length of the longest common prefix between two words in 
/i±, see Figure |l|f[ We say that h has the central tree property if 2 lcp(h) < min(h). 


c 



o 


o 





o 


1 


c 


c 


Figure 1. The Stallings graph of the subgroup generated by 
h = {ba~^cb'^a'^b~^,a^c^a~^cbc,c~^b~^aba~^c~^ba~^c^), has the 
central tree property and satisfies lcp(/i) = 2. The origin is denoted 
by • and the central tree is depicted in bold arrows. 


Proposition 1.2. Leth=(hi,...,hk) be a tuple of elements of F [A) with the 
central tree property and let F[ = (h). Then the Stallings graph T{H) consists of a 
central tree of height t = lcp{h) and of k outer loops, one for each hi, connecting 


^This definition is closely related with the notion of trie of h^. The height of the trie of 
is 1 + lcp(h). 
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the length t prefix and the length t suffix of hi (two leaves of the central tree), of 
length \hi \ — 2t respectively. The set of vertices of the central tree can be identified 
with the set of prefixes of length at most t of the words of h^. 

In particular, h is a basis of H. Moreover, if g is a basis of H also with the 
central tree property, then h^ and ^ coincide up to the order of their elements. 

Proof. The central tree property shows that the cancellation (folding) that 
occurs when one considers the bouquet of hi-labeled loops around the origin, stops 
before canceling entirely any one of the hi. The result follows immediately. □ 


Under the central tree property, we record an interesting sufficient condition 
for malnormality. 

Proposition 1.3. Let h = {hi,... ,hk) be a tuple of elements of F (A) with the 
central tree property and let H = {h). Let us assume additionally that 3 tcp{h) < 
min{h) and that no word of length at least ^{min{h) — 3 lcp{h)) has several occur¬ 
rences as a factor of an element of h^, then H is malnormal. 


Remark 1.4. In the proof below, and in several other statements and proofs 
later in the paper, we consider words whose length is specified by an algebraic 
expression which does not always compute to an integer {e.g., ^(min(/i) — 3 lcp(/i))). 
To be rigorous, we should consider only the integer part of these expressions. For 
the sake of simplicity, we dispense with this extra notation, and implicitly consider 
that if a word of length £ is considered, then we mean that its length is [£J. 


Proof. Let m = min(h) and t = lcp(/i). Proposition 


1.2 


shows that T{H) 

consists of a central tree of height t and of outer loops, one for each hi, of length 
\hi\ — 2t > m — 2t. 

If H is not malnormal, then a word u labels a loop at two distinct vertices 
of T{H). Without loss of generality, u is cyclically reduced. Moreover, given the 
particular geometry of T{H), both loops visit the central tree. Without loss of 
generality, we may assume that one of the u-labeled loops starts in the central tree, 
at distance exactly t from the base vertex 1, and travels away from 1. In particular, 
\u\> m — 2t, and if v is the prefix of u of length m — 2t, then n is a factor of some 


Let s be the start state of the second u-labeled loop: reading this loop starts 
with reading the word v. Suppose that s is in the central tree: either reading u 
(and v) from s takes us away from 1 towards a leaf of the central tree and into an 
outer loop, and u is a factor of some h^^; or reading v from s moves us towards 
1 for a distance at most t, after which the path travels away from I, along a path 
labeled by a factor of some hf^, for a distance at least m — 3t. In either case, a 
factor of u of length m — 3t > ^{m — 3t) has two occurrences in h^. 

Suppose now that s is on an outer loop (say, associated to h^^) and that s' 
is the first vertex of the central tree reached along the loop. If s' is reached after 
reading a prefix of u of length greater than ^{m — 3t), then the prefix of v of length 
^{m — 3t) is a factor of h^^. Otherwise v labels a path from s which first reaches 
s', then travels towards I in the central tree for a distance at most t, and thence 
away from I, along a path labeled by some h^^, which it follows over a length at 
least equal to {m — 2t) — ^{m — 3t) — t = \{m — 3t). 
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Thus, in every case, u contains a factor of length \{m — 3t) with two distinct 
occurrences as a factor of an element of and this concludes the proof. □ 


To conclude this section, we note that the properties discussed above are pre¬ 
served when going from a tuple h to a sub-tuple: say that a tuple g is contained in 
a tuple h, written g < h, ii every element of g is an element of h. 

Proposition 1.5. Let g, h be tuples of reduced words such that g < h. 

• If h has the central tree property, so does g. 

• If h consists of cyclically reduced words and h has Property C"(A), then 
so does g. 

• If h has the central tree property, then {If) is a free factor of (h), and ( 5 ) 
is malnormal if {h) is. 


Proof. The first two properties are immediate from the defi nition, 
now that h has the central tree property. Then by Proposition 


1.2 


Supose 
h is a basis of 

(h), and by the first statement of the current proposition, g is a basis of (^. Since 
9 < h, {g) is a free factor of (h). 

In particular, (g) is malnormal in {h) (a free factor always is, by elementary 
reasons). It is immediate from the definition that malnormality is transitive, so if 
(h) is malnormal in F, then so is (g). □ 


2. Random models and generic properties 

We will discuss several models of randomness for finitely presented groups and 
finitely generated subgroups, or rather, for hnite tuples of cyclically reduced words 
(group presentations) and finite tuples of reduced words. In this section, we fix 
a general framework for these models of randomness and we survey some of the 
known results. 

2.1. Generic properties and negligible properties. Let us say that a 
function /, defined on N and such that lim/(n) = 0, is exponentially (resp. super- 
polynomially, polynomially) small if f{n) = o(e“‘^") for some d > 0 (resp. f{n) = 
o{n~‘^) for every positive integer d, f{n) = o{n~‘^) for some positive integer d). 

Given a sequence of probability laws (Pn)r!, on a set S, we say that a subset 
X C S is negligible if lim„ P„(X) = 0, and generic if its complement is negligible|^ 

We also say that X is exponentially (resp. super-polynomially, polynomially) 
negligible if P„(X) tends to 0 and is exponentially (resp. super-polynomially, poly¬ 
nomially) small. And it is exponentially (resp. super-polynomially, polynomially) 
generic if its complement is exponentially (resp. super-polynomially, polynomially) 
negligible. 

In this paper, the set S will be the set of all finite tuples of reduced words, or 
cyclically reduced words, and the probability laws P„ will be such that every subset 
is measurable: we will therefore not specify in the statements that we consider only 
measurable sets. 

The notions of genericity and negligibility have elementary closure properties 
that we will use freely in the sequel. For instance, a superset of a generic set is 


^This is the same notion as with high probability or with overwhelming probability, which are 
used in the discrete probability literature. 
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generic, as well as the intersection of finitely many generic sets. Dual properties 
hold for negligible sets. 


2.2. The few-generator model and the density model. In this section, 
we review the results known on two random models, originally introduced to discuss 
finite presentations. We discuss more general models in Section below. 

2.2.1. The few-generator model. In the few-generator model, an integer fc > 1 is 
fixed, and we let P„ be the uniform probability on the set of fc-tuples of words of F of 
length at most n. Proposition |2.1| is established by elementary counting arguments, 
see Gromov |101 Prop. 0.2.A] or Arzhantseva and Ol’shanskii [U Lemma 3]. 

Proposition 2.1. Let k > I , 0 < a < \, 2a < (3 < I and 0 < A < 1. 
Then a k-tuple h of elements of F of length at most n picked uniformly at random, 
exponentially generieally satisfies the following properties: 

• min{h) > fin, 

• tcp{h) < an, 

• no word of length An has two occurrences as a factor of an element of h^. 


In view of Propositions 1.2 and 1.3 this yields the following corollary ([3], and 
for the malnormality statement). 


Corollary 2.2. Let k > 1. If h is a k-tuple of elements of F of length at 
most n picked uniformly at random and H = (h), then 

• exponentially generieally, h has the central tree property, and in particular, 
r(iL) can be constructed in linear time (in k-n), simply by computing the 
initial cancellation of the elements of ; H is freely generated by the 
elements of h, and H has rank k; 

• exponentially generieally, H is malnormal. 

Moreover, if h and g generate the same subgroup, then exponentially generieally, 
= up to the order of the elements in the tuples. 


The following statement follows from Proposition |1.5[ and from Theorem |2.4| 
below (which is independent). 


Corollary 2.3. In the few-generator model, if h is a k-tuple of cyclically 
reduced words of length at most n, then 

• for any 0 < X < ^, h exponentially generieally satisfies the small cancel¬ 
lation property C'(A) ; 

• exponentially generieally, the group {A \ h) is infinite, torsion-free, word- 
hyperbolic, it has solvable word problem (by Dehn’s algorithm) and solvable 
conjugacy problem. 


2.2.2. The density model. In the density model, a density 0 < d < 1 is fixed, 
and a tuple of cyclically reduced elements of the n-sphere of density d is picked 
uniformly at random: that is, the tuple h consists of cyclically reduced words 
of length n. This model was introduced by Gromov m and complete proofs were 
given by Ol’shanskii |25) , Champetier [7] and Ollivier |23j . 

Theorem 2.4. Let 0<a<d<fi<l. In the density model, the following 
properties hold: 
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( 1 ) exponentially generically, every word of length an occurs as a factor of a 
word in h, and some word of length fin fails to occur as a factor of a word 
in h^; 

(2) if d < then exponentially generically, h satisfies property C'{X) for 
X > 2d but h does not satisfy C'(X) for X < 2d; in particular, at density 
d < ^, h satisfies exponentially generically property C"(|) and the group 
{A I h) is infinite and hyperbolic; and at density d > ^, exponentially 
generically, h does not satisfy C"(|); 

(3) at density d > \, exponentially generically, (h) is equal to F[A), or has 
index 2. In particular, the group {A \ h) is either trivial or Z/2Z; 

(4) at density d < 5 , the group {A \ h) is generically infinite and hyperbolic. 


Properties (l)-(3) in Theorem 2.4 are obtained by counting arguments. Prop¬ 
erty (4) is the “hard part” of the theorem, where hyperbolicity does not follow from 
a small cancellation property. 

As pointed out by Ollivier [241 Sec. I.2.c], the statement of Theorem 2.4 still 
holds if a tuple of cyclically reduced elements is chosen uniformly at random at 
density d in the n-ball rather than in the n-sphere (that is, it consists of words of 


length at most n). We will actually verify this fact again in Section 3.6 


3. A general probabilistic model 

We introduce a fairly general probabilistic model, which generalizes both the 
few-generator and the density models. 

3.1. Prefix-heavy sequences of measures on reduced words. For every 
reduced word u G TZ, let V{u) be the set of all reduced words v of which m is a prefix 
(that is: V{u) = uA* n TZ). Let also Vn{u) be the set TZn n P(u). The notation V 
can also be extended to a set U of reduced words: V{U) = Uugn 

Let (Kn)n>o be a sequence of probability measures on TZ and let C > 1 and 
a G (0,1). We say that the sequence (Kra)n>o is a prefix-heavy sequence of measures 
on TZ of parameters (C, a) if: 

(1) for every n > 0, the support of the measure ]R„ is included in TZn', 

(2) for every n> 0 and for every u G TZ, ii K„(’P('a)) ^ 0 then for every v G TZ 

Rn{V{uv) I V{u)) < Cal"!. 

This prefix-oriented definition is rather natural if one thinks of a source as gener¬ 
ating reduced words from left to right, as is usual in information theory. 

Remark 3.1. Taking u = e in the definition yields M.n(V{v)) < C'q;I’'L For 
n = |z;|, we have 'P(u) n TZn = {u}, so the probability of v decreases exponentially 
with the length of v. 

Example 3.2. The sequence of uniform distributions on TZn is a prefix-heavy 
sequence of measures with parameters C = 1 and a = Indeed, if u is a 

reduced word of length at most n > 0 (for a longer u, K„('P(u)) = 0), and if uv is 
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reduced, we have 


R4V{uv)\V{u)) 


1 

(2r-l)l” 
_ 1 _ 

2r(2r-l)l”l-i 

0 


if |m| + |i;| < n and u ^ e, 
if |?;| < n and u = e, 
otherwise. 


Example 3.3. By a similar computation, one verifies that the sequence of 
uniform distributions on C„, the cyclically reduced words, is also a prefix- heav y 
sequence of measures, with parameters C = a = (see Section 1.1). 


For the rest of this section, we fix a sequence of measures (Rn)n>o on TZ, which 
is prefix-heavy with parameters (C, a). All probabilities refer to this sequence, that 
is: the probability of a subset of TZn is computed according to ]R„. 


Remark 3.4. If X and Y are subsets of TZ, the notation K„(A | Y) is tech¬ 
nically defined only if K„(y) 7 ^ 0. To avoid stating cumbersome hypotheses, we 
adopt the convention that M„(A | Y) IR„(F) = 0 whenever ]R„(T) = 0. 


3.2. Repeated factors in random reduced words. Let us first evaluate 
the probability of occurrence of prescribed, non-overlapping factors in a reduced 
word. Let m > 0, v = (vi,... ,Vm) be a vector of non-empty reduced words 
and 2 = (ii,... ,im) be a vector of integers. We denote by E{v,i) denote the set 
of reduced words of length n, admitting vj as a factor at position ij for every 
1 < j < TO (if m = 0, then E(v,i) = TZ). If n > 1, we also write En{v,i) for 
E{v, 2 ) n TZn. 

Lemma 3.5. Let v = (vi,..., Vm) be a sequence of non-empty reduced words 
and i = (ii,... ,im) be a sequence of integers satisfying 

1 < *1 < *1 + |l^l| < *2 < *2 + |t2 | < . . ■ < im + \Vm\ < n. 

Then the following inequality holds: 

{E{v,i)) < 

In addition, ifrn>l and x = (vi,..., Vm-i) and j = {ii ,..., im-i), then 
^u{E{v,r)) < Cal"™lM„(E(f,J)). 

Proof. The proof is by induction on to and the case to = 0 is trivial. We now 
assume that to > 1 and that the inequality holds for vectors of length to — 1. Since 
(M„)„ is prefix-heavy, we have 

Kn(iP(w^^m)) = ^uiviuvn,) I Viu)) M„(P(ii)) < Ca'"-!(P( r)) 

for each u. Since E{v,i) = 'P{Ei^-i{x, j)Vm), summing the previous inequality 
over all u S Ei^_i{x,j) yields 

Kn(A(^;,^-)) < Cal"-lK„(iP(F;,^_i(f,J))) = Cal"-lK„(F;(x,J)) 
since n>im + \vm\- This concludes the proof. □ 


Corollary 3.6. Let Vi,... ,Vm be non-empty reduced words. The probability 
that a word of length n admits Vi,... ,Vm in that order as non-overlapping factors, 
is at most 


Proof. This is a direct consequence of Lemma [STSl summing over all possible 
position vectors. □ 
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We now consider repeated non-overlapping occurrences of factors of a pre¬ 
scribed length. 


Lemma 3.7. Let 1 < j, t < n be such that i + t < j. The probability that a 
word of length t occurs (resp. a word of length t and its inverse occur) at positions 
i and j in a reduced word of length n is at most equal to Ca*. 

The probability that a reduced word of length n has two non-overlapping oc¬ 
currences of a factor of length t (resp. occurrences of a factor of length t and its 
inverse) is at most equal to Cn^a*'. 


Proof. Let En{t,i,j) be the set of reduced words of length n in which the 
same factor of length t occurs at positions i and j. Then En(t,i,j) is the disjoint 
union of the sets En{{v,v), {i,j)), where v runs over TZt- By Lemma 3.5 we have 


v^TZt vGlZt 


,mv),m = ca\ 


where the last equality is due to the fact that the En{{v), (i)) form a partition of 
TZn when v runs over TZt- 

The same reasoning applied to the vectors (u, yields the analogous inequal¬ 
ity for words containing non-overlapping occurrences of a word and its inverse. 

The last part of the statement follows by summing over all possible values of i 
and j. □ 


Applying Lemma |3.7| with i = 1 and j = n — t -\-1, we get the following useful 
statement. 


Corollary 3.8. For every positive integers n,t such that n > 2t, the proba¬ 
bility that a reduced word u G TZn is of the form vwv~^, for some word v of length 
t, is at most Ca*. 


Finally, we also estimate the probability that a word has two overlapping oc¬ 
currences of a factor. Note that we do not need to consider overlapping occurrences 
of a word v and its inverse, since a reduced word cannot overlap with its inverse. 


Lemma 3.9. Let 1 < t < n. The probability that a reduced word of length n has 
overlapping occurrences of a factor of length t is at most Cnta^. 


Proof. If a word v overlaps with itself, more precisely, if xv = vz 
words x,z such that 0 < |a;| = \z\ < |p|, then it is a classical result from 
torics on words that v = x‘^y where s = 

|l>| — s|a;| (see Figure [^. 


I^J > 1 and y is the prefix of x 


for some 
combina- 
of length 


- V - 

<-i- V ->: 










X 

X 

X 

X 

X 

y 

V 

y 


Figure 2. A classical result from combinatorics of words: if xv = 
vz with 0 < |x| < |?;|, then v is of the form v = x’^y for some 
positive integer s and some prefix y oi x. 
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It follows that, if a reduced word u has (overlapping) occurrences of a factor v 
of length t at positions i and j {j < i + t), then u admits a factor of the form xv 
at position i, where x is the prefix of v of length j — i. Note that, once t and j — i 
are fixed, v is entirely determined by x. Therefore this occurs with probability 

n — l 71 — l 

2 — 1 ^' — 2+1 xGTZj-i 2—1 J—2+1 xGTZj-i 


It follows that 

n 2 +t—1 n 2 +t—1 

^^EE Ca* Y ^u{E{{x),{i))) = EE Ca* < Cnta*. 

i—1 x£TZj — i i—1 


by Lemma 3.5 and using the fact that the En{{x), (i)) form a partition of TZn when 


X runs over TZj-i. 


□ 


3.3. Repeated cyclic factors in random reduced words. A word i; is a 
cyclic factor of a word u if either u € A*vA*, or v = ViV 2 and u S V 2 A*vi - in 
which case we say that u is a straddling factor. For now, we only assume that u 
is reduced, but we will be ultimately interested in the cyclically reduced case, see 
Corollary |3.14 

Lemma 3.10. Let 1 < < n such that i + t < n and let v be reduced word v 

of length t. Then the probability that v is a cyclic factor at position i of an element 
ofTZn, is at most {Cn + C'^t)a^ < 2C'^na*. 


Proof. The probability that v occurs as a (regular) factor of an element of 
TZn is at most Cna* by Corollary ^ 

On the other hand, v occurs as a straddling factor of u S TZn if u = V 2 V 1 , 
with 1 < £ = \v 2 \ < t and u € viA*V 2 , that is, u € E((vi, V 2 ), (1, n — £ + 1)). 
By Lemma 3.5 this happens with probability at most C'^a*. Summing over the 


possible values of £, we find that that v occurs as a straddling factor of an element 
of TZn with probability at most C^ta*. 

Therefore the probability that v occurs in it as a cyclic factor is at most (Cn + 


C'^t)a^, as announced. 


□ 


We now consider multiple occurrences of cyclic factors of a given length. 

Lemma 3.11. Let 1 < t < n. The probability that a reduced word of length n has 
two non-overlapping occurrenees of a cyclic factor of length t (resp. an occurrenee 
of a cyelic factor of length t and its inverse), is at most (Cn^ -\-C^nt)a* < 2C'^n^a*. 

Proof. Again there are several cases, depending whether the occurrences of 
the word (or the word and its inverse) are both standard factors, or one of them is 
straddling. 

The probability that a reduced word u G TZn admits two non-overlapping oc¬ 
currences of a (standard) factor of length t (res p. o ccurrences of a factor of length 
t and its inverse), is at most Cn'^a* by Lemma [TT} 

We now consider the situation where u has two occurrences of the same word 
of length t, one as a standard factor and one straddling: there exist integers £, i and 
reduced words vi,V 2 such that 0<£<t, £<i<n — 2t-\-£, \v 2 \ = £, \viV 2 \ = t and 

u G E((v 2 ,ViV 2 ,Vi), (l,i,n-t + £+l)) = E((v 2 ,Vi,V 2 ,Vi),(l,i,i + £,n-t + £ + l)). 
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Applying Lemma |3.5| twice, we find that the probability of this event according to 
is at most equal to C'^a*K„(i?((u 2 ,'Ci), (1, *)))• 

Then the probability P that a word in admits two non-overlapping occur¬ 
rences of a factor of length t, one standard and one straddling, is bounded above 
by the sum of these values when £, i, vi, V 2 run over all possible values: 

t n-2t+e 

E E E CVM„(A((u2,i;i),(1,z))). 

£—0 i—£ V2^7li vi^TZt-£ 


For fixed values of £ and i, Tin is the disjoint union of the E{{v 2 ,vi), (l,z)) when 
V 2 runs over Ti^ and vi runs over Tit-e- So we get 


P < 


t n-2t+i 

Y < C^ntaK 


1=0 i=t 

Thus the probability that a reduced word of length n has two non-overlapping 
occurrences of a word of length t as cyclic factors is at most equal to (Cri^ + 
C^nt)a^ < 2(7^11^0*, as announced. 

Finally, we consider the situation where a factor of length t and its inverse 
occur in u, with one of the occurrences straddling: that is, there exist integers i, i 
and reduced words vi,V 2 such that 0<£<t, £<i<n — 2t + £, |u 2 | = £, 11111121 = t 
and u lies in 

E{{v 2 ,V 2 ^v^^,Vi), {l,i,n-t+£+l)) = £'((i; 2 , \ Ui), {l,i,i+£,n-t+£ + l)). 


As above, the probability of this event according to K„ is at most 
Ca*-~^Rn{E{{v2, V2^,V^^), (1, i, i + £))) 

and the probability P' that a reduced word of length n has two non-overlapping 
occurrences of a word of length t as cyclic factors, with one of them straddling, 
satisfies 

t-l n-2t+e 

^'^E E E E Ca*~'^n{E{{v2,v^\v^^),{l,t,i + £))). 

i=l i=l v2&'Re v-iGTlt-e 


For fixed values of £, i and V 2 , En{{v 2 ,V 2 ^),(l,z)) is the disjoint union of the 
En{{v 2 j V 2 ^,v^^), (1, i,i + £)) when vi runs over Tit-t- Therefore we have 


t-l n-2t+l 

^'^E E E Ca*-^RnmV2.V2^),{lp)))- 

£—1 i—l V2^'R-£ 

By Lemma 3.5 again, IR„(if((i; 2 , (1, z))) < C'Q:^M„(if((u 2 )(l))) and we get, by 

the same reasoning as above, 

t-l ra-2t-|-^ t-l ra-2t-|-^ 

^'^E E E C2a‘K„(A((z;2),(l)))=^ ^ C^a*<C^nta\ 

£—1 i—£ V2^'R-£ £—1 i—i 


Thus the probability that a reduced word of length rz has an occurrence of a word of 
length t and its inverse as a cyclic factor is, again, at most equal to {Cn'^+C^nt)a* < 
2C'^n^a*, as announced. □ 


Finally, we give an upper bound to the probability that a reduced word has 
overlapping occurrences of a cyclic factor of length t (observing again that a reduced 
word cannot have overlapping occurrences of a (cyclic) factor and its inverse). 
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Lemma 3.12. Let 1 < t < n. The probability that a reduced word of length 
n has overlapping occurrences of a cyclic factor of length t is at most equal to 
{Cnt + 2CH^) < 3 C^nto*. 

Proof. The probability that a reduced word of length n has overlapping oc¬ 
currences of a non-straddling factor of length t is at most Cnta* by Lemma 3.9 

Let us now assume that the reduced word u G TZn bas overlapping occurrences 
of a cyclic factor v of length t, with one at least of these occurrences straddling. Note 
that any cyclic factor of u is a factor of u^. Therefore, using the same arguments 
as for Lemma 3.9 u has a straddling cyclic factor of the form xv = x^~^^y, where 
\x\ > 0, 2 / is a prefix of x and s > 1. In particular, v = x‘^y and t = s|a;| -|- \y\. 

It follows that u is in V 2 A*vi, for some vi,V 2 such that viV 2 = x^^^y. Denote by 
preff (z) and suff^ (z) the prefix and the suffix of length £ of a word z. Then there exist 
a cyclic conjugate z of x and integers 0<h,£<|z| = |x| and m,m' > 0 such that 
vi = suff^(z)z’"' and V 2 = z"‘pref^(z). Note that x®+^ 2 / = suff;,(z)z'”+"*'pref^(z) 
and 


h + £ = |y| (mod |z|) 

, Js-bl iih + £=\y\ 
m + m = <. 

if h -I- £ = |z| -I- |y| 

£ -I- |z| = (m -I- m')|z| -I- h -|- £. 

Observe also that ly] is determined by |z| (|y| = t (mod |z|)), that h is determined 
by £ and |z|, and that m' is determined by m, £ and |z|. Then 

t-i fc-i n-Lf J 

w G IJ IJ IJ U where 

k—1 £—0 m—0 zGTZk 

= ^^(( 2 ™pref^( 2 ),suff^(z)z""'), (I,n- to'|z| - h + 1)) 

and h and m' take the values imposed by those of k = |z|, £ and m. In particular, 
the probability P that a reduced word in TZ^ has overlapping occurrences of a cyclic 
factor of length t, with at least one of these occurrences straddling, satisfies 

t-i fc-i i-i-Lf J 

^ ^ EE E E 

k—1 £—0 m—0 z^TZk 

If TO > 1, then 

= E{{z, z™” Vref£(^), suff?i(z)z’" ), (I, |z| + l,n - m'\z\ - h + 1)) 
and a double application of Lemma |3.5| shows that 

Kn(XzA„^) < C2a”^'l"l+'‘a(™-i)l^l+%„(L;((z), (1))) = C^a^R^EHz), (1))). 
Summing these over z G TZk (with fc, £ and to fixed, to > I), we get 

C^a*Rn{E{{z),{l))) < a\ 

zeTZk zeTZk 

since TZn is partitioned by the ]R„(£l((z), (1))) (z S TZk)- 
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If TO = 0 and h + i = |y|, then m'\z\ =t+\z\ — \y\ and we note that 
Xz,e,o = £'((pref^(z),suff?i(z)z’"'), {l,n - t - \z\ + i + 1)) 

C £;((pref^( 2 ;),suff,i( 2 ;),sufF|j^|(z)z’" {l,n - t - \z\ + i + l,n - t + 1)). 
By Lemma [3751 we get 

Rn(^z,^,o) < C'Q;*IR„(L;((pref^(z),sufF/,(z)), (1, n - t - |z| + ^ + 1))). 
Summing over all z G TZk {k and i fixed), we get 

X] ^niXz,e,o) < X! Ca*RriiE{{pref^{z),suffhiz)),{l,n-t-k + £ + l))) 
zeTZk zeTZk 

< ^ ^ Ca*RniE{{zi,Z2),il,n-t-k + i+l))) 

zi^TZ^ Z2 ^'R-h 

< Ca\ 

since TZn is partitioned by the IR„(£l((zi, Z 2 ), (1, n — t — k + £ + 1))) {zi G TZ^, 
Z2 G TZh)- 

Finally, if to = 0 and h + £ = \z\ + \y\, then m'\z\ = t — \y\. Therefore 
= Li((pref^(z),suff;i(z)z™'), {l,n - t - \z\ + £ + 1)) 

= F;((pref^(z), pref|,,|_^(sufFh( 2 ;)),sufF|j^|(z)z’” ), (1, n - t - |z| + ^ + 1, n - t + 
By Lemma [331 this yields 

< C'a*R„(F;((pref^(z), pref|,,|_^(sufF,i(z))), (1, n - t + |z| + £ + 1))). 

As in the previous case, summing over all z G TZk {k and £ fixed) yields 

R„(X2 _£_o) < Ca*. 

zGTik 


Then we get the following upper bound for the probability P: 


P < 


t-lfc-ll+UJ i-1 fc-1 

EE E cv+eEca 


k=l 1=0 m=l k=le=0 


< - l)a* + C^t{t - l)a^ 

< 2CH{t-l)ak 


This concludes the proof. □ 

In order to extend the results of this section to cyclically reduced words, we 
need an additional hypothesis, essentially stating that the probability of cyclically 
reduced words does not vanish. In fact, we have the following general result. 


Lemma 3.13. Let (R„)n>o £>e a sequence of measures satisfying limmiRniCn) = 
p > 0. Let X he a subset ofTZ. Then for each i5 > 1 and for every large enough n, 
the probability Rn{X \ C) that a cyclically reduced word of length n is in X is at most 
equal to |M„(X). In particular, if X is exponentially (resp. super-polynomially, 
polynomially, simply) negligible, then so is X n C inC. 

Proof. By definition, K„(A: | C) = R„(X n C | C) = < ^Rn(^), 

which concludes the proof. □ 
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The following statement is an immediate consequence. 

Corollary 3.14. Let (IR „)„>0 be a prefix-heavy sequence of parameters {C,a), 
with the property that liminf„ ]R„(C„) = p > 0. Then for every 5 > 1 and every 
large enough n, the probability that a cyclically reduced word of length n has two 
non-overlapping occurrences of a cyclic factor of length t (resp. an occurrence of 
a cyclic factor of length t and its inverse, two overlapping occurrences of a cyclic 
factor oflengtht) is at most ^{Cn^ -\-C"^nt)a* (resp. ^(Cn^ + C^nt)a‘, ^■C'^nta^). 

Proof. Let X be the set of reduced words of length n with two non-overlapping 
occurrences of a cyclic factor of length t (resp. an occurrence of a cyclic factor of 
length t and its inverse, two overlapping occurrences of a cyclic factor of length t). 
It suffices to apply Lemma |3.13| to the set X, and to use the results of Lemmas |3.11| 
and 13.121 □ 


3.4. Measures on tuples of lengths and on tuples of words. For every 
positive integer k, let Tk denote the set of fc-tuples of non-negative integers and 
TWfc denote the set of fc-tuples of reduced words. Let also T = IJ^Tfe and TW = 
IJj, TWfc be the sets of all tuples of non-negative integers, and of reduced words 
respectively. 

For a given h = {hi ,..., hk) of TWk, let ||/i|| be the element of Tk given by 

A prefix-heavy sequence of measures on tuples of reduced words is a sequence (Pn)n>o 
of measures on TW such that for every h = {hi,... ,hk) of TW, 

k 

i=l 

where (T„)„>o is a sequence of measures on T and (Kn)ra>o is a prefix-heavy se¬ 
quence of measures on TZ. If (]R„)„>o is prefix-heavy with parameters {C,a), then 
we say that (T„)„>o is prefix-heavy with parameters {C,a). 


Remark 3.15. In the definition above, to draw a tuple of words according to 
P„, one can first draw a tuple of lengths {£i,..., £k) following T„, and then draw, 
independently for each coordinate, an element of TZ^. following . 


Example 3.16. Let u(n) be an integer-valued function. The uniform distri¬ 
bution on the u(n)-tuples of reduced words of length exactly n is a prefix-heavy 
sequence of measures: one needs to take T„ to be the measure whose weight is 
entirely concentrated on the u(n)-tuple (n,..., n) and to be the uniform distri¬ 
bution on TZn (see Example 3.2). 

The uniform distribution on the u(n)-tuples of reduced words of length at 
most n is also a prefix-heavy sequence of measures. Here the support of T„ 
must be restricted to the tuples {xi,... ,Xy(n)) such that Xi < n for each i, with 


T„(a:i,...,T^(„)) = rii ifer- 

Both can be naturally adapted to handle the uniform distribution on the u(n)- 
tuples of cyclically reduced words of length exactly (resp. at most) n. 

For appropriate functions v{n), we retrieve the few-generator and the density 
models discussed in Section |2.2[ We will see a more general class of examples in 
Section [H 
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3.5. General statements. If a? € T, we denote by max(i:) and min(af) the 
maximum and minimum element of x. We also denote by size(a;) the integer k such 
that X €Tk- 

The statistics min, max, and size are extended to tuples of words by setting 
min(/i) = min(||ft,||), max(ft,) = max(||/i||) and size(/i) = size(||ft,||). In the sequel we 
consider sequences of probability spaces on TW and min, max, and size are seen as 
random variables. 

The following statements give general sufficient conditions for a tuple to gener- 
ically have the central tree property, generate a malnormal subgroup, or satisfy a 
small cancellation property. 

Proposition 3.17. Let (P „)„>0 be a prefix-heavy sequence of measures on 
tuples of reduced words of parameters (C, a). Let / : N —)■ N such that f{i) < | for 
each i. If there exists a sequence {rin)n>o of positive real numbers such that 

(1) lim P„ (size^ > rjn) = 0 and lim pn = 0, 

then a random tuple of words generically satisfies lcp{h) < f{min(h)). 

If the limits in Equation 0 converge polynomially (resp. super-polynomially, 
exponentially) fast, then lcp{h) < f{min[h)) polynomially (resp. super-polynomially, 
exponentially) generically. 

Proof. The set of all tuples h that fail to satisfy the inequality \cp{h) < 
f(m\n{h)) is the union Qi U Q 2 of the two following sets: 

• the set Gi of all tuples h = {hi ,..., hk) such that for some I < i < j < k, 
a word of length /(min(/i)) occurs as a prefix of hi or h~^, and also of hj 
or h-\ 

• the set Q 2 of all tuples h = {hi,..., hk) such that for some 1 < i < k, hi 
and h~^ have a common prefix of length f{m\n{h)), 

and we only need to prove that lim„P„(t?i) = lim„P„(^ 2 ) = 0. 

Let k,l be positive integers and let Xk,e be the set of tuples h e TWk such 
that min (ft,) = I. If ft S Xk,e. and 1 < i < j < k, then the probability that hi and 
hj have the same prefix of length t = f{£) is 

^\E\i'PM)Rihp{'P{w)) < Ca* Rihp{'P{w)) < Ca*. 

wGTZt wGTZt 

Then we have P„(^i | Xk,e) < Ak'^Ca^^^\ or rather P„(0i | Xk,t) < min(l, 
where the factor corresponds to the choice of i and j and the factor 4 corresponds 
to the possibilities that hi or h~^, and hj or h~^ have a common prefix of length 
f{i). Therefore we have Rn{Gi H Xk,i) < min(l, Rn{Xk,t) 

We can split the set of pairs {k,() into those pairs such that k'^a^^^'> > pn and 
the others, for which < pn. Then we have 

P„(0l) = Y ^‘ri{Gl n Xk,e) < Pn(size^ > 7?„) + AC Pn, 

k,l 

which tends to 0 under the hypothesis in Equation Q. 

Similarly, if ft € Xk,£ and i < k, the probability that hi and ft” have a 
common prefix of length f{£) is at most by Corollary |3.8[ It follows that 

Pn(02 I Xk,t) < min(l,ftC'a^(^)), and Vn{G 2 <AXk,e) < min(l, Vn{Xk,e)- 
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Splitting the set of pairs (fc, €) into those pairs such that > -qn and those 

for which < ry„, yields 

Pn(02) = ^^niG 2 Xk,e) < P„ (size > 7?„) + C . 

k,i 

Now size < size^ so P„(sizea^*^"’'"^ > q^) < P„(size^ > q^)- 

It follows that lim„ P„(size> qn) = 0, and hence lim„P„(^ 2 ) = 0, which 
concludes the proof. □ 


Theorem 3.18 (Central tree property). Let (P„)„>o be a prefix-heavy sequence 
of measures on tuples of reduced words of parameters {C, a). If there exists a 
sequence {qn)n>o of positive real numbers such that 

(2) lim P„(s/ze^a^ > Vn) =0 and lim 7 y„ = 0, 

n—>-oo \ / n—>-oo 

then a random tuple of words generically has the central tree property. In particular, 
such a tuple is a basis of the subgroup it generates. 

If the limits in Equation ([^ converge polynomially (resp. super-polynomially, 
exponentially) fast, then the central tree property holds polynomially (resp. super- 
polynomially, exponentially) generically. 


Proof. By definition, a tuple h G TW satisfies the central tree prop erty if 


lcp(h) < gQ theorem is a direct application of Proposition 

function /(£) = and of Proposition 


1.2 


3.17 


to the 

□ 


Theorem 3.19 (Malnormality). Let (Pn)ra>o be a prefix-heavy sequence of mea¬ 
sures on tuples of reduced words of parameters (C,a). If there exists a sequence 
iTin)n>o of positive real numbers such that 


(3) 


lim ] 

n—>-oo 


(^size^ ma>? a 8 > = 0 and 


lim qn = 0, 

n—>-oo 


then a random tuple of words generically generates a malnormal subgroup. 

If the limits in Equation (§ converge polynomially (resp. super-polynomially, 
exponentially) fast, then malnormality holds polynomially (resp. super-polynomially, 
exponentially) generically. 


Proof. By Proposition 1.3 a sufficient condition for a tuple h G TW to 


generate a malnormal subgroup is to have lcp(h) < | min(ft,), and to not have two 
occurrences of a word of length i(min(/i) — 3lcp(/i)) as a factor of a word in hfi^. 
This condition is satisfied in particular if lcp(h) < ^ min(/i) and no word of length 
I min(h) has two occurrences as a factor of a word in h^. 

Therefore the set of all tuples h that generate a non malnormal subgroup is 
contained in the union l/i U 112 U ^3 U of the following sets: 

• the set Gi of all tuples h = {hi,..., hk) such that lcp(/i) > | min(/i), 

• the set G 2 of all tuples h = {hi,..., hk) such that for some I < i < j < k, 
a word of length | min(/i) occurs as a factor of hi, and also of hj or hj^, 

• the set Gs of all tuples h = {hi,..., hk) such that for some 1 < i < k, hi 
and h~^ have a common factor of length | min((i), 

• the set Ga of all tuples h = {hi,..., hk) such that for some 1 < i < k, hi 
has at least two occurrences of a factor of length | min(/i), 
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and we want to verify that Pra(^/i), Vn{G 2 ), Pn(^/ 3 ) and Pn( 04 ) all tend to 0 when 
n tends to infinity. 

By Proposition 3.17 the set Gi is negligible as soon as lim„ P„(size > rjn) = 
0. This is true under the hypothesis in Equation since size ot < size^ max^ aw, 
and hence P„(sizeQ;T > rjn) < Pn(size^ max^aw > rjn). 

Let now Xkx,M be the set of tuples h G such that max(/i) = M. Let 


1 ^ ^ and h G Xj-^e^M- By Corollary 3.6 the probability that hj has a 


given factor v of length | is at most equal to CMa®. Summing this probability 
over all words v which occur as a factor of hi (at most \hi\ < M such words), it 
follows that the probability that hi and hj have a common factor of length i = | 
is at most equal to CM^ai. Summing now over the possible values of i and j, we 
find that Pji(02 n < min(l, P„(Xfc, ,£,m) Q-nd therefore, as above 

Pn(02) < P„(size^ max^ q;T > ry„) + C r7„. 

It follows from Equ ation <§ that 02 is negligible. 


By Lemma 3.7 the probability that hi and h^ ^ have a common factor of length 


I is at most CM^a». Summing over all choices of i, we find that 
Pn(03) < P„(sizemax^aT > :^„) + C 77„. 

Since size max^ q;t < size^ max^ aw, we conclude that 03 is negl igible. 


Finally, we have P„(04) < § size max min a "P by Lemma 
Pn(04) < 


3.9 


and hence 


min C 

size max min a 8 > i7n) + w bn- 


Since size max min a 8 < size^ max^ a s , it follows as above that the set 04 is neg¬ 


ligible. 


□ 


Theorem 3.20 (Small cancellations property). Let (P„)„>o he a prefix-heavy 
sequence of measures on tuples of reduced words of parameters {C\a), such that 
liminfn M„(C„) = p > 0. For any A G (0, |), if there exists a sequence (bn)n>o of 
positive real numbers such that 

(4) lim P„ (size^ may? > rjn) = 0 and lim r;„ = 0, 

then the property C'{X) generieally holds. 

If the limits in Equation @ converge polynomially (resp. super-polynomially, 
exponentially) fast, then Property C'{X) holds polynomially (resp. super-polyn¬ 
omially, exponentially) generieally. 

Proof. A sufficient condition for a tuple of cyclically reduced words h to 
satisfy C"(A) is for every piece in h to have length less than A min(h). Then the set 
0 of tuples that fail to satisfy C"(A) is contained in the union 0i U 02 U 03 U 04 of 
the following sets: 

• the set 01 of all tuples of cyclically reduced words h = (hi,..., hk) such 
that for some l<i<j<k, a, word of length A min(h) occurs as a factor 
of hi, and also of hj or h~^, 

• the set 02 of all tuples of cyclically reduced words h = (hi,..., hk) such 
that for some 1 < i < k, hi has two non-overlapping occurrences of a 
factor of length A min(h), 
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• the set C /3 of all tuples of cyclically reduced words h = {hi,... ,hk) such 
that for some 1 < i < k, hi has non-overlapping occurrences of a factor of 
length Amin(h) and its inverse, 

• the set Q 4 of all tuples of cyclically reduced words h = (hi,..., h^) such 
that for some 1 < i < k, hi has overlapping occurrences of a factor of 
length A min(h), 

and we want to verify that P„(0i), Pn( 02 ), ^niGs) and P„( 04 ) all tend to 0 when 
n tends to infinity. 

As in the proof of Theorem |3.19[ we find that the probability that a tuple of 
reduced words h is such that a word of length Amin(h) occurs as a factor of hi, 
and also of hj or h~^, for some i < j is at most P„(size^ max^ Q,Amin ^ _|_ (j 

Reasoning as in the proof of Corollary |3.14[ it follows that, for every <5 > 1, 

r 

Vn{Gi) < - (P„(size^ > r]n) + C rjn) , 

and it follows from Equation 0 that Qi is negligible. 

Now using Corollary |3.14| we show that 

Pn( 02 ),Pn( 03 ) < “ (Pn(size(max^ + max min)a^, 

p ^ ' 

c 

^n{G 4 ) < - (P„(size(max min -f > ? 7 „) -|- 2(7^r?„) . 

p ' ' 

Since sizemax^, size max min and sizemin^ are less than size^ max^, the hypothesis 
in Equation Q shows that ^ 2 , Gs and Ga are negligible, and this concludes the 
proof. □ 


3.6. Applications to the uniform distribution case. The few-generator 
model and the density model, based on the uniform distribution on reduced words 
of a given length and discussed in Section |2.2t are both instances of a prefix-heavy 
sequence of measures on tuples, for which the parameter a is a = 2 r-i ^ Exam¬ 
ples |3.2| and |3.16| In this section, the measure is the uniform distribution on 
TZn- 


The results of Section 3.5 above allow us to retrieve many of the results in 
Section 2.2 — typically the results on the small cancellation property C'{X) up to 
density , whether one considers tuples of cyclically reduced words of length n or 
of length at most n —, and to expand them. In particular, we show that the results 
on the central tree property and malnormality in the few-generator model can be 
extended to the density model, and that we have a phase transition theorem for 
the central tree property (at density j). 


Small cancellation properties Let 0 < d < 1. In the density model, at density 
d, we choose uniformly at random a u(n)-tuple of cyclically reduced words of length 
n, with v{n) = In particular, for every tuple h of that sort, we have size(/i) = 

u(n) and max(/i) = min(/i) = n. 

Let 0 < A < i and for each n, let 

Vn = (2r-I)-("-‘^)". 
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Therefore size^ 


< Vr 


with 


Note that |C„| < |72.„| = ~ 1) 

probability 1. Now observe that ? 7 „ converges exponentially fast to 0 when d < ^. 


In view of Theorem 3.20 this provides a proof of part of Theorem 2.4 (2), namely, of 
the fact that, at density less than Property C'{X) holds exponentially generically. 

It is unclear whether the more difficult property, that hyperbolicity holds gener¬ 
ically at density less than f, can be established with the same very general tools. 

Observe that the set 7^<„ of reduced words of length at most n has cardinality 
1 -1- Er=i ~ 1)” ~ 7 ^- same reasoning as above, at density 

less than |, a tuple of cyclically reduced words of length at most n exponentially 
generically has Property C"(A). 

Properties of subgroups We now return to tuples of reduced words like in the 
few-generator model, but with a density type assumption on the size of the tuples. 
For 0 < d < 1, we consider |7?,<„[‘^-tuples of reduced words of length at most n, 
and the asymptotic properties of the subgroups generated by these tuples. For such 
tuples h, we have s\ze{h) < (2r — 1)'^" and max(li) = n. 

In addition, for every 0 < /r < 1, Proposition 2.1 shows that min(h) > ^n, 
exponentially generically. 

We first establish the central tree property. 

Proposition 3.21. Let 0 < d < ^. At density d, a tuple of reduced words 
of length at most n chosen uniformly at random, exponentially generically has the 
central tree property, and in particular it is a basis of the subgroup it generates. 

If d > then at density d the central tree property exponentially generically 
does not hold. 


Proof. For a fixed /r < 1, the following inequality holds exponentially generi¬ 
cally: 


size^ a 2 < 


/ \ 2d 

( 2 r-l)-(2-2d)". 


At every density d < ^, one can choose p < 1 such that ^ — 2d>0 (say, p = 

For such a value of p, rjn = [ ) (2r’ — 1 )“( 2 -2d)n converges exponentially fast 

to 0 and, in view of Theorem |3.18[ this proves the first part of the proposition. 

If d > j, let d' be such that | < d' < min( |, d). By the classical Birthday 
Paradojl^ exponentially generically two words of the tuple share a prefix of length 
2d'n. This prove the second part of the proposition. □ 


Along the same lines, we also prove the following result. 

Proposition 3.22. Let 0 < d < density d, a tuple of reduced words of 

length at most n chosen uniformly at random, exponentially generically generates 
a malnormal subgroup. 


Proof. For a fixed /i < 1, we have 


2 2 - 

Size max a 


< 



2d 

n2(2r- l)-(8-2d)n^ 


E' is a set of size M and a: is a uniform random tuple of E'^, the probability that 
the coordinates of x are pairwise distinct is (1 — j^)(l — ~ which is at most 

exp(—by direct calculations. 
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exponentially generically. 

If d < one can choose fi < 1 such that ^ — 2d > 0 (say, fi = and we 

conclude as above, letting 

and using Theorem |3.19| □ 


Remark 3.23. Propositions 3.21 and 3.22 above generalize Corollary 2.2 (1) 
and (2), from the few generator case to an expone ntial number of generators — up 
to density | and respectively (see Proposition 1.51. 

Proposition |3.21| can actually be radically refined if the tuples have less than 
exponential size and if we drop the requirement of exponential genericity. 


Proposition 3.24. Let f be an unbounded non-decreasing integer function. 
Let k > 1 be a fixed integer. Then a k-tuple h of reduced words of length at 
most n chosen uniformly at random, generically has the central tree property, with 
lcp{h)<f{n). 

Let c,c' > 0 such that d log(2r — 1) > 2c. Then an n'^-tuple h of reduced words 
of length at most n chosen unifoirmly at random, generically has the central tree 
property, with lcp{h) < c'logn. 


Proof. If A: is a fixed integer, then as in the proof of Proposition |3.21[ we 
find that, for each p < 1, size^ is generically less than or equal to rjn = 

k^(2r — l)“/(^"), which tends to 0. This concludes the proof on the size of the 


central tree of random fc-tuples by Proposition 3.17 


If we now consider n'^-tuples, we find that, for each p < 1, size^ a'^ logl^")) jg 
generically less than or equal to rjn = n‘^‘^{2r—l)~‘^' iog( 2 »'-i)- 2 c)^ which 

tends to 0. By Proposition |3.17| again, this concludes the proof. □ 


4. Markovian automata 


We now switch from the very general settings of the previous section to a specific 
and computable way to define prefix-heavy sequences of measures on reduced words. 

We introduce Markovian automata (Section |4.1[ ) which determine prefix-heavy 
sequences of measures under a simple and natural non-triviality assumption. These 
automata are a form of hidden Markov chain, and when they have a classical ergod- 
icity property, then cyclically reduced words have asymptotically positive density. 
We are then able to generalize the results of Section ^ about central tree property 
and malnormality. 

In the last part of the section, we give a generalization of Theorem |2.4| (2) 
and (3) on small cancellation and the degeneracy of a finite presentation. 


4.1. Definition and examples. A Markovian automator^A consists of 
• a deterministic transition system (Q, •) on alphabet X, where Q is a finite 
non-empty set called the state set, and for each qGQ,xGX,q-xGQ 
or q ■ X is undefined; 


^This notion is different from the two notions of probabilistic automata, introduced by Rabin 
1261 and Segala and Lynch 1281 . respectively. 
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• an initial probability vector 70 € [ 0 , 1 ]*^, that is, a positive vector such 
that Eg6Q7o(g) = 1 ; 

• for each p £ Q, a probability vector {'-f{p,x))xex G [0,1]'^, such that 
7 (p, x) = 0 if and only if p • a; is undefined. 

If u = xq ■ ■ ■ Xn £ X* {n > 0), we write 7(9, u) = j{q, xo)^{q ■ Xq, a;i) • • • 7(9 • 
(xq ■ ■ ■ Xn-i), Xn)- We let 7(g, u) = 1 if u is the empty word. We also write 7o(u) = 

E 9 GQ 70 ( 9 ) 7 ( 9 , m). 

Markovian automata are very similar to hidden Markov chain models, except 
that symbols are output on transitions instead of on states. We will discuss this 
further in Section 14.21 below. Markovian automata can be considered as more 
intuitive since sets of words (languages) are naturally described by automata. 

We observe that, for each n > 0, E|«|=n7(“) = 1- Thus 7 determines a 
probability measure on the set of elements of X* of length n: if |u| = n, then 
K„(u)=7(u). 

In the sequel, we consider only Markovian automata on alphabet A, where 
only reduced words have non-zero probability. More precisely, the support of a 
Markovian automaton A is the set of words that can be read in A, starting from a 
state q such that 70 ( 9 ) ^ 0 , that is, the set of all words u such that 7 (u) ^ 0 : we 
assume that our Markovian automata are such that their support is contained in 
7 ^. 

Example 4.1. Uniform distribution on reduced words of length n. It is imme¬ 
diately verified that the following Markovian automaton yields the uniform distri¬ 
bution on reduced words of each possible length. The state set is Q = A. For each 
a G A, there is an a-labeled transition from every state except a~^, ending in state 
a. All these transitions have the same probability, namely 2 r-i , initial 

probability vector is uniform as well, with each coordinate equal to ^. 

One can also tweak these probabilities, to favor certain letters over others, or 
to favor positive letters (the letters in A) over negative letters. 

Example 4.2. Distributions on rational subsets of F{A). The support of a 
Markovian automaton A is always rational and closed under taking prefixes, but 
it does not have to be equal to the set of all reduced words. We can consider 
a rational subset L of E(A), or rather a deterministic transition system reading 
only reduced words, and impose probabilistic weights on its transitions to form a 
Markovian automaton. The resulting distribution gives non-zero weights only to 
prefixes of elements of L. 



Figure 3. Markovian automata A and A!. 

Figure|^represents two such automata (transitions are labeled by a letter and a 
probability, and each state is decorated with the corresponding initial probability), 
which are related with the modular group, PSL{2,Z) = {a,b\ af,b^)- 
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The support of the distribution defined by automaton A is the set of words 
over alphabet {a, b, 6 “^} without occurrences of the factors a^, 6 ^, bb~^ and 

b~^b, and the support of the distribution defined by A! consists of the words on 
alphabet {a, 6 }, without occurrences of a} or b^. Both are regular sets of unique 
representatives of the elements of PSL{2,Z): the first is the set of geodesics of 
PSL{2,Z), and also the set of Dehn-reduced words with respect to the given pre¬ 
sentation of that group; the second is a set of quasi-geodesics of PSL{2, Z). Notice 
that the distribution produced by A! is not uniform on words of length n of its 
support. 


Example 4.1 shows that the sequence (Kn)r!, of uniform measures on reduced 
words, discussed in Sections 2.2 and 3.6 can be specified by a Markovian automaton. 
We also know that this sequence is prefix-heavy (Example 3.2 1 . This is a general 
fact, under mild assumptions on the Markovian automaton. 


Proposition 4.3. Let A be a Markovian automaton and let the sequence 

of probability measures it determines. If A does not have a cycle with probability 
1 , then (Mn)„ is a prefix-heavy sequence of measures, with computable parameters 
iC,a). 


Proof. Let i be the maximum length of an elementary cycle (one that does 
not visit twice the same state) and let 6 be the maximum value of 7 ( 9 , k) where k 
is an elementary cycle at state q. Under our hypothesis, d < 1. 

Every cycle k can be represented as a composition of at least \k\II elementary 
cycles (here, the composition takes the form of a sequence of insertions of a cycle 
in another). Consequently 7 ( 9 , k) < dV. Finally, every path can be seen as a 
product of cycles and at most \Q\ individual edges. So, if m is a word and q G Q, 
then ^{q,u) < 5 ^ , that is 7 ( 17 , m) < Cal"! where C = and a = 5^. 

Let u,v be reduced words such that uv is reduced and let n > |ut|. We have 

]R„(P(?u;)) = 7 o(mi;) = 'yo{ph{p,u)'y{p ■ u,v) 

P&Q 


□ 


- X! 'yoiph{p,u) 

\peQ J 

= 7o(n) Cal"l =K„(P(r)) Ca'"', 

and hence K„(P(mu) | P{u)) < which concludes the proof. 

Remark 4.4. The parameters C and a described in the proof of Proposition |4.3| 
may be far from optimal. If /3 < 1 is a uniform bound on the probabilities of the 
transitions of A, then jo(v),j(q,v) < (3^'"'^ for each word v, and the computa tion 
in the proof above shows that K„(P(ut) | Viu)) < We will see in Section 
that we can be more precise under additional hypotheses. 


4.2 


Now let M be a Markovian automaton without a probability 1 cycle, such that 
the sequence of probability measures it induces is prefix-heavy with parameters 
{C,a). If 0 < d < 1, we say that a tuple h of reduced words of length at most 
(resp. exactly) n is chosen at random according to A, at a-density d if h co nsists 
of 0 “'^” words. Observe that this generalizes the concept discussed in Section 
and [3431 


2 . 2.2 
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With the same proofs as in Section we have the following generalization of 
Propositions |3.21| and |3.22| related to central tree property and malnormality. 

Corollary 4.5. Let A be a Markovian automaton without a probability 1 
cycle, such that the induced sequence of probability measures is prefix-heavy with 
parameters {C, a). Then a tuple of reduced words of length at most n chosen at 
random according to A, at a-density d < exponentially generically has the central 
tree property. 

At a-density d < it exponentially generically generates a malnormal sub¬ 
group. 

4.2. Irreducible Markovian automata and coincidence probability. 

An (n, n)-matrix M is said to be irreducible if it has non-negative coefficients and, 
for every i,j < n, there exists s > 1 such that M^{i,j) > 0. Equivalently, this 
means that M is not similar to a block upper-triangular matrix. We record the 
following general property of irreducible matrices. 

Lemma 4.6. Let M be an irreducible matrix. Then its spectral radius p is a 
(positive) eigenvalue with a positive eingenvector. Ln particular, there exist positive 
vectors nmin and Umax such that, componentwise, 

< Af”l < p”umax for all n > 0 

where 1 is the vector whose coordinates are all equal to 1. Moreover, there exist 

Cmiii) Cmax > 0 SUch that 

Cminp"' < 1*M”1 < Craa.^p'^ for all n > 0. 

Proof. We refer the reader to | 8 l chap. 13, vol. 2] for a comprehensive pre¬ 
sentation of the properties of irreducible matrices and in particular for the Perron- 
Frobenius theorem, which establishes that the spectral radius of M is an eigenvalue 
with a positive eigenvector: let vq be such an eigenvector, and let Umin (resp. Umax) 
be appropriate multiples of uq with all coefficients less than 1 (resp. greater than 1). 
Then we have, componentwise, p"Umin = Af”umin < M"1 < M"umax = P^ilmax- 
Let Cmin (resp. Cmax) be the sum of the coefficients of Umin (resp. Umax)- Then, 
summing over all components of M”umin and M"umax, we get CminP" < PM "1 < 
CmaxP ■ n 


Going back to automata, we note that a Markov chain can be naturally associ¬ 
ated with a Markovian automaton: if A is a Markovian automaton on alphabet A, 
with state set Q, we define the Markov chain M{A) on Q as follows: its transition 
matrix is given by M{p, q) = J2aeA s.t. p a=q 7(Pi “) P,Q & Q, and its initial 

vector is 70 . 

We say that the Markov chain M{A) (or, by extension, the Markovian automa¬ 
ton A), is irreducible if this transition matrix is irreducible, which is equivalent to 
the strong connectedness of A. We note that, in that case, if A does not consist 
of a simple cycle, then A does not have a cycle of probability 1. In view of Propo¬ 
sition |4.3[ this implies that the sequence of probability measures determined by 


A is prefix-heavy. We will see below (Proposition 4.9) that we can give a precise 
evaluation of the parameters of this sequence. 

To this end, we introduce the notion of local Markovian automata, where labels 
can be read on states instead of edges. 
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More precisely a Markovian automaton is local if all the incoming transitions 
into a given state are labeled by the same letter: for all states p, q and letters a, b, 
a p ■ a = q ■ b then a = b. If M is a Markovian automaton, let A' denote the local 
Markovian automaton obtained as follows. 

• its set of states \s Q' = {{q,a) € Q x A\3p & Q, p ■ a = q}] 

• its transition function a is given by (p, a)xb = {q,b) H p ■ b = q; 

• its initial probability vector 7 g is given by 


7o((P, a)) 


7 o(p) if a is the least label of the transitions into p 
0 otherwise 


(we fix an arbitrary order on A) 

• its transition probability vectors are given by j' {{p, a), b) = 7 (p, b). 



Figure 4. A Markovian automaton and its associated local automaton. 


Proposition 4.7. Let A be a Markovian automaton. Then the associated loeal 
Markovian automaton A' assigns the same probability as A to every reduced word. 
Moreover, if A is irreducible, then so is Af. 

Proof. The first part of the statement follows directly from the definition, by 
a simple induction on the length of the words: indeed, we retrieve a path in A by 
forgetting the second coordinate on the states of A'; and every path of A starting 
at some state q, can be lifted uniquely to a path in Af starting at any vertex of the 
form (g, a) of A'. 

Assume that A is irreducible and let (p, a) and {q,b) be states of A'. By 
definition of A', there exists a state q' of A such that q' ■ b = q. Moreover, since 
A is irreducible, there exists a path from p to q' in A, say p —4 qi —^ ... —A q'. 
Then 

(p,a) (gi,ai) {q',b) 

is a path in A' from (p, a) to (q, b), so A' is irreducible as well. □ 


If A is a Markovian automaton, we denote by (or just M when there is no 
ambiguity) the stochastic matrix associated with its local automaton A': 


M((p,a),(g,6)) 


i{{p,a),b) 

0 


7 (p, b) if p • 6 = g 
otherwise. 
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We also denote by M[ 2 ] and M[ 3 ] the matrices defined by 

M[ 2 ]((p,a), (g,6)) = (^M((p, a), (g, 5))) and 
M[ 3 ]((p,a), (g,5)) = (^M((p, a), (g, 5))^ , 

and by and a[ 3 ] the largest eigenvalue of M[ 2 ] and M[ 3 ], respectively. The value 
a[ 2 ] is called the coincidence probability of A, and it will play an important role in 
the sequel. 

Observe that if A is local, then A! is equal to A^ up to the name of the states. 
We are interested in local automata for the following properties. 

Lemma 4.8. Let A be a local Markovian automaton. Then the following holds 

• for all states p, q there is at most one transition from p to q; 

• two paths starting from the same state are labeled by the same word if and 
only if they go through the same states in the same order; 

• for every i > 0, we have M^(p,g) = Eug 7 ?,^,p-u =9 m), M[ 2 ](p,g) = 

We can now give an upper bound for the parameters of the sequence of proba¬ 
bility measures determined by an irreducible Markovian automaton. 


Proposition 4.9. Let A be an irreducible Markovian automaton with coin¬ 
cidence probability a[ 2 ], and let (Kr!,)n be the sequence of probability measures it 

determines. If A does not consist of a single cycle, then there exists a constant 

1 /2 

C > 0 such that (Kn)„ is prefix-heavy with parameters (C, )• 


Proof. Let u be a reduced word of length £ and let g G Q be a state of A. By 
Lemma |4.8[ we have 

7(g, v) = \/-f{q,vY < y^M^j^g^g^ < 


Lemma 


4.6 


then shows that there exists C > 0 such that 'y{q,v) < We can 


now conclude as in the proof of Proposition |4.3 


□ 


This yields the following refinement of Corollary |4.5[ 


Corollary 4.10. Let A be a Markovian automaton without a probability 1 
cycle and with coincidence probability q;[ 2 ]. Then a tuple of reduced words of length 
at most n chosen at random according to A, at a[ 2 ]-density d < ^ (resp. d < -A 
exponentially generically has the central tree property (resp. generates a malnormal 
subgroup). 


4.3. Ergodic Markovian automata. If the Markovian automaton A is irre¬ 
ducible and if, in addition, for all large enough n, M{A)'^{q, g) > 0 for each q G Q, 
we say that A (resp. M{A)) is ergodic. This is equivalent to stating that A has a 
collection of loops of relatively prime lengths, or also that all large enough integral 
powers of M{A) have only positive coefficients. If A is ergodic, we can apply a 
classical theorem on Markov chains, which states that there exists a stationary vec¬ 
tor 7 such that the distribution defined by A converges to that stationary vector 
exponentially fast (see [181 Thm 4.9]). In the vocabulary of Markovian automata, 
this yields the following theorem. 
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liu € A* has length n, let Qn{u) = p-u be the state of A reached after reading 
the word u starting at state p. We treat as a random variable. 

Theorem 4.11. Let A be an ergodic Markovian automaton on alphabet A, with 
state set Q (\Q\ > 2). For each q G Q, the limit lim„_).oo Rn[Qn = d] exists, and if 
we denote it by 7 (g), then ^ is a probability vector (called the stationary vector^. 
In addition, there exist K > 0 and 0 < c < 1, such that = d\~ 7 ( 9 )! < 

for all n large enough. 


Remark 4.12. The constant c in Theorem 14. Ill is the maximal modulus of the 
non -1 eigenvalues of M{A). 


Example 4.13. The Markovian automaton discussed in Example |4.1[ relative 
to the uniform distribution on reduced words of length n, is ergodic. Its stationary 
vector 7 is equal to 70 {l{q) = ^ for every state q), and the constant c is jttt- 
On the other hand, the Markovian automaton A in Example 4.2 is irreducible 
but not ergodic (loops have even lengths), and it does not have a stationary vector. 


We use Theorem |4.11| to show that, under a very mild additional hypothesis, 
an ergodic Markovian automaton yields a prefix-heavy sequence of measures (Mn)n 
such that liminf IR„(C) > 0. 


Proposition 4.14. Let A be an ergodic Markovian automaton, with initial 
vector 7 o and stationary vector 7 and let (K„)n be the sequence of measures it 
induces on reduced words, 7o(®)7(®~^) 7 ^ then liminfK„(C) > 0. 

Observe that the sum 7o(“)7(“~^) is less than 1, since we are dealing 

with probability vectors, unless there exists a (necessarily single) letter a such that 
7o(a) = 7(a“^) = 1- 


Proof. The set C of cyclically reduced words is the complement in TZ of the 
disjoint union of the sets aA*a~^ {a G A). Now we have 

]R„(ai*a"^) = ^ 7 o(p) 7 (p,a) ^ -/{p ■ a,u)j{p ■ {au),a~^) 


peQ 


. |ii|=n—2 


= '^loiphiP^a) ^K„(Q ^'“2 = 9)7(9,a ^) 


peQ 


yqeQ 


= ^loiphip.a) ^iliq) + e{q,n))-f{q,a ^) , 

peQ \qeQ J 

where |e(q,n)| < Kc^~^, with K and c given by Theorem 4.11 Then we have 

M„(aA*a"^) = 70(0)7(0"^)-k7o(a) ^ e{q,n)j{q, a~^) 

\9GQ 

and limK„(aA*a“^) = 70 ( 0 ) 7 ( 0 “^). It follows that 

lim]R„(C) = 1 - X! 7o(a)7(a”^), 


oGA 


thus concluding the proof. 


□ 
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Proceeding as in Section |3.61 we can use Propositio n |4.14[ Corollary |3.14| and 
the results of Section 3.5 to generalize part of Theorem 2.4 (2), and show that, up 
to a[ 2 ]-density a tuple of cyclically reduced words of length at most n chosen 
at random according to A, exponentially generically satisfies the small cancellation 
property C'(A). We will now see (Theorem 4.15) that we can improve this bound, 
and go up to a p] "density 


4.4. Phase transitions for the Markovian model. We can now state a 
phase transition theorem, which generalizes parts of Theorem |2.4[ Let us say that 
an ergodic Markovian automaton is non-degenerate if its initial distribution 70 and 
its stationary vector 7 satisfy T'o(®)7(®~^) 7^ 1- 


Theorem 4.15. Let A he a non-degenerate ergodic Markovian automaton with 
coineidence probability q;[ 2 ] . Let 0 < d < 1 and let G be the group presented by a 
tuple h of cyelically redueed words of length n, chosen independently and at random 
according to A, at q;[ 2 ]-density d. Then we have the following phase transitions: 

• i/0 < A < I and 0 < d < then exponentially generically h satisfies the 
small cancellation property C'{X); if X = then G is generically infinite 
and hyperbolic; 

• if d > ^ then exponentially generically h does not satisfy the small can¬ 
cellation property G'{X); 

• if d > ^ then exponentially generically G is degenerated in a sense that is 
made precise in Proposition \4.2lf[ and which implies that G is a free group 
or the free product of a free group with 'LjTL. 


The rest of the paper is devoted to the proof of Theorem |4.15[ The first 
statement is established in Proposition 4.16 while the second and third statements 


are proved respectively in Propositions 4.22 and 4.23 


4.5. Long common factors at low density. In this section we estimate 
the probability that random words share a long common factor. More precisely, we 
show the following statement, the first part of Theorem |4.15| 

Proposition 4.16. Let A be a non-degenerate ergodic Markovian automaton 
with coincidence probability a[ 2 ]. Let X € (0,^) and let d G (0,^). A tuple of 
cyclically reduced words of length n taken independently and randomly according 
to A, at a[ 2 ]-density d, exponentially generically satisfies the small cancellation 
property C"(A). 

The structure of the proof of Proposition |4.16| resembles that of the proof of 
Theorem |3.20[ and requires the consideration of several cases. This is the object of 
the rest of Section 14.51 

To this end, we introduce additional notation: let fiq{n) be the vector of coor¬ 
dinates '){q,u) when u ranges over IZn in lexicographic order, and let || 7 g(u)||fc = 
(Eug 7 ?.„ 7(9’be the f/j-norm of this vector. We start with an elementary 
result. 


Lemma 4.17. Let A he a Markovian automaton, let 0 < i,i < n be integers, 
and let u G TZe- The probability p that u occurs as a cyclic factor at position i in a 
reduced word of length n is bounded above by 

f E9 gq7(9.u) ifi<n-e+l 

I J2q,q'eQ 7(9’ '*^l)7(9^ ^ 2 ) ifi>n — i-\-l and u = U 1 U 2 with |ui | = n — i-\-1 
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Proof. If i < n - £ + 1, then p = '^uA^ ^ is equal to 

5l7o(p) 51 l{p,w)l(.P-w,u) = Y,lo{p)Y. 5Z j{p,w)-f{q,u) 

p^Q wGTZi-i P^Q qGQwGTZi-i 

p-w—q 

= 51 7o(p) 51 = 9]7('?,u) 

pGQ (JGQ 

< 51 7o(p)7(9,w) = 5^7(g,u). 

P.gGQ gGQ 

Ifi>n — £+1 and u = miU 2 with |ui | = n — i + 1, then 


p = K„(u 2A” ^ui) = > 7 o(9')7(9',U2) 


E 

q’eQ 


E 

w£TZn-e 


jiq' ■ U2,w)^{q' ■ U2W,Ui) 


= E 70(9)7(9,^2)51 E ^{q ■ U 2 ,w)'y{q,ui) 

q'GQ Q^Q wGTZn-£ 

q' ■U 2 W—q 

= E 70(9^7(9',U2) 51 =9)7(9,Ml) 

g'GQ gGQ 

< E 7(9,Mi)7(9',M2), 

q,q'eQ 


which concludes the proof. 


□ 


Proposition 4.18. Let A be an irredueible Markovian automaton with eoin- 
eidence probability apj. Let n, £, i and j be positive integers such that £ < n and 
hj ^ M. Denote by LIjiAAt j) probability that two reduced words of length n 
share a common cyclic factor of length £ at positions respectively i and j. Then 
there exists a positive constant K such that 


L{n,i,i,j) < Ka[ 2 y 


Proof. Without loss of generality (see Proposition 4.7 1, we may assume that 
A is local. The proof is based on a case study. 

Case 7; i, j < n — £ + 1. Using Lemma[4.17[ we have 


L{n,i,i,j) < E E i{p,uh{q,u). 


E 

p,q£Q uGlZi 


By a repeated application of the Cauchy-Schwarz inequality, we get 

(5) Lin,i,i,3) < E Il7i>(^)ll2|l7g(^)ll2 < E Il7g(^)lli 

p,gGQ qGQ 

Now, in view of Lemma |4.8| and since A is local, we have 

(6) E Il7g(^)ll2 = E E 7(9, m)^ = E E E 7 (p,m)^ = 1*M[2]1- 

q^Q qG:Q uG'R-e pGQ qGQ uGlZi 

p-u—q 

Since M is irreducible. Lemma [L6| shows that there exists a positive constant K > 0 
such that, for £ large enough, we have 


L{nJ,iJ) < Ell7g(^)ll2 = 

qeQ 

which concludes the proof of the statement in that case. 
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Case 2: i > n — £ + 1 and j < n — £ + 1. (The case where i < n — £ + 1 and 


j>n — £+1 is symmetrical.) Let k = n — i + 1 (so 1 < k < £). By Lemma 4.17 
we have 


L{n,£,i,j) < E E l{P,Ul)lip',U2)l{q,UiU2) 

uiGTlk p,p',q&Q 

U 2 GTZe-k 


< 


< 


E E 7(p, U2h{q, ui)'y(q\ U 2 ) 


Ui^TZk p,p',q,q'£Q 
U2G'7Ze-k 


E 

vuiGTZk p,q&Q 


E 7(P,Mi)7(g,Mi) 


E E 

iU2G'Rt-k p',q'&Q 


l{p ,U 2 h{q ,U 2 ) 


By Cauchy-Schwarz, it follows that 


j) < E Il>(*)ll2ll79(fc)l|2 k)\\2 


<p,qeQ 


\p',q'eQ 




^qeQ 

< ( 1 * Mf 2 ] 1 ) ( 1 * 1 ) by Equation 


qeQ 
't 


By Lemma [T6l there exists a constant Ki such that these two factors are bounded 
above, respectively, by E'ia|' 2 ] and Therefore 


L{n,£,i,j) < Kfa[2] 


as announced. 

Case 3: i,j > n — £ + 1. Without loss of generality, we may assume that i < j, 
and we let A: = n — j + 1 and k' = £ — {n — i + 1). Then a word u of length £ 
occurs as a cyclic factor in two reduced words Wi and u >2 of length n, at positions 
i and j respectively, if u = U 1 U 2 U 3 with |ui| = k, \u 2 \ = j — i and jual = fc', and if 
wi G u-iA^~^uiU 2 and W 2 G U 2 U'iA^~^ui. Then we have 


L{n,£,i,j) < E E 7(9, UiU 2 h{q", M3) lip, uihip', M2M3) 

ui^'TZk p,p'gQ 

U2G'JZj-i q^q"^Q 

u^^TZk' 

^ E E liq, Mi)7(g', U2)l{q", M3) 7 (p, Mi) 7 (p', U2)iip", ^3) 

p,p',p" gQ 

q g' qi'^Q 

- E 7 ( 9 ,ni) 7 (p','ai) E liq”,'^2)lip,U2) ^ liq',U 3 )jip",U 3 ). 

uiGlZk U 2 GTZj-i usGTZ^/ 

p',qeQ p,q''eQ p",q'eQ 


By the Cauchy-Schwarz inequality, L(n,£,i,j) is at most equal to 


E Il7p(fc)l|2||79(fc)l|2 Il7p(j-*)l|2||7g(j-*)l|2 Il7p(fc') II2 ||7g(fc') II2 

p,qeQ p,qeQ p,qeQ 
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and hence to 

qeQ qGQ qeQ 

Lemma |4.6| shows that these three factors are bounded above, respectively, by 
and Kia^^] some constant Ki. Therefore 

L{n,i,t,j) < if= if?4], 

as announced. □ 


Proposition 4.19. Let A be an irreducible Markovian automaton with coinci¬ 
dence probability ap]. Denote by (n, £, i, j) the probability for two reduced words 
of length n to have an occurrence of a factor of length £ in the first word at position 
i, and an occurrence of its inverse in the second word, at position j, with £ < n and 
i,j < n — i -£ 1. Then there exists a positive constant K such that 

L^^\n,£,i,j) < Kaf^y 

Proof. The proof follows the same steps as that of Proposition |4.18| In the 
first case {i,j < n — i + 1), Lemma [4.17| shows that 

L^‘^\n,£,i,j) < E E l{p,uh{q,u ^). 

Since the set of reduced words of length £ and the set of their inverses are equal, 
we get, by the Cauchy-Schwarz inequality, 

L(^Hn,£,^,J)< ll>Wll2||7,W||2, 

p,q&Q 

and the proof proceeds as in the corresponding case of Lemma |4.18[ 

In the second case {i > n — i 1 and j<n — £-\-l),iik = n — i-\-l, then we 
have 

L^'^\n,£,i,j) < E E 7(p, ■ui)7(p', U2)l{q, U2 

P,P',Q^Q 

U2^'7Z£-k 


^ E E lip, Ul)jip', U 2 )liq, 'U-2 ^)li<l', 

ui^TZk p,p',q,q'&Q 
U 2 G'R-£-k 


< 


E E 

uiGTZk p,q'&Q 


liP,Uih{q',u^^)^(^ Y lip',U2)l{q,U2^)^ 


u2&'R-t-k p',q&Q 


and as in the previous case, the proof proceeds as in Lemma 4.18[ 

The situation is a little more complex in the last case {i,j > n — £-\-l). Without 
loss of generality, we may assume that i < j. With the same notation as in the 
proof of Lemma 4.18 we distinguish two cases. If |m 3 | < \u 2 \ (that is, £— k < k', or 
£+i+j < 2n+2), we let U 2 = U 2 U 2 with = |m 3 |. Then wi G uzA^~^uiu' 2 u 2 and 
W 2 G u '2 A^~^uf^U 2 ^ and, as in the previous proof, we find that i,j) 


'"1 ^ “3 “2 

is at most equal to the sum of the 

lip, ui)liq, uf^)lip', U2)-fiq', U2~^)lip", u'^^iq", u'Y 


)lip"',ushiq"',uf^) 


withui G TZj-i,U 2 G TZ£-k,U 2 G Tlk'-(e-k),U 3 G Tli-k, a.ndp,p',p'',p''',q,q',q'',q''' 
are states in Q. The proof then proceeds as before, with multiple applications of 
the Cauchy-Schwarz inequality. 
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The case where juaj > \u 2 \ (that is, £ + i + j > 2n + 2) is handled in the same 
fashion. □ 


Corollary 4.20. Let A be a non-degenerated ergodic Markovian automaton 
with coincidence probability ap]- Let n,£,i,j be positive integers such that £ < n 
and i,j < n. There exists a eonstant K > 0 such that the probability p that two 
cyclically reduced words of length n have occurrences of the same word of length £ 
(resp. of a word of length £ and its inverse) as cyclic factors at positions respectively 
i and j, satisfies p < 


Proof. The hypothesis on A guarantees that liminf IR„(C) = p > 0 by Propo¬ 
sition 4.14 Our statement then follows from Propositions |4.18] and 4.19[ in view of 
Lemma 13.131 □ 


We now consider the case of multiple occurrences of a length £ cyclic factor (or 
of such a word and its inverse) within a single reduced word. 


Proposition 4.21. Let A be a non-degenerate ergodic Markovian automaton 
with coincidence probability a[ 2 ]. There exists a constant K > 0 such that the 
probability that a cyclically reduced word of length n has two occurrences of a length 
£ word as cyclic factors, or occurrences of a length £ word and its inverse as cyclic 
vactors, is at most . 


Proof. By Proposition 4.9 the sequence (K„)„ induced by A is prefix- heavy 
1 /2 

with parameters {C, 0^21 ) for some C. The result then follows from Corollary 


We can now proceed with the proof of Proposition 


3.14 

—□ 

4.16 Let N = An 

iV-tuple of cyclically reduced words which fails to satisfy C'(A), must satisfy one 
of the following conditions: either two words in the tuple have occurrences of the 
same cyclic factor of length £ = An or occurrences of such a word and its inverse; 
or a word in the tuple has two occurrences of the same cyclic factor of length £ or 
occurrences of such a word and its inverse. 

By Corollary |4.20[ the first event occurs with probability at most 

1 l ^ 2 {\- 2 d)n 

2 In 0 ( 2 ] < «[ 2 ] 


K 


for some K > 0. By Proposition |4.21[ the second event occurs with probability at 
most 

KN£‘^n‘^af^^ < , 

for some K > 0. Thus both events occur with probabilities that vanish exponen¬ 
tially fast, and this concludes the proof of Proposition |4.16| 


4.6. Long common prefixes at high density. In this section, we estab¬ 
lish the following propositions corresponding respectively to the second and third 
statement of Theorem 14.151 


Proposition 4.22. Let A be a non-degenerate ergodic Markovian automaton 
with coincidence probability Q![ 2 ]- Let A € (0) |) o.’n-d let d G (f,l)- A tuple of 
cyclically reduced words of length n taken independently and randomly according to 
A, at density d, generically does not satisfy the small cancellation property C"(A). 
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Proposition 4.23. Let A be a non-degenerate ergodic Markovian automaton 
with coincidence probability a[ 2 ] ■ Let E be the set of letters of A which label a 
transition in A and let D = A \ {E U E~^). Let d> \ and N > and let G be 

a group presented by an N-tuple of cyclically reduced words chosen independently 
at random according to A. 

If E D E~^ = 0, then G = F{\D\ + 1) exponentially generically. 

If E n E~^ 0, then exponentially generically G = F{D) (if n is even) 

or G = F{D) (if n is odd). 


Both proofs rely heavily on the methodology introduced by Szpankowski 
to study the typical heigth of a random trie. We first establish simple lower and 
upper bounds for words to share a common prefix (Lemmas 4.24 and 4.25). 


Lemma 4.24. Let A be an irreducible Markovian automaton with coincidence 
probability a[ 2 ]. LetP{n,i) (^resp. P'{n,()) be the probability that two reduced (resp. 
cyclically reduced) words of length n share a common prefix of length i. There exists 
a constant K > Q such that P{n,£) > 

If A is non-degenerate and ergodic and t is large enough for all the coefficients of 
M* to be positive, then K can be chosen such that P'{n, £) > Kaf^^] when n > £-\-t-\-l. 


Proof. Let p be a state such that 7o(p) > 0. To establish the announced 
lower bounds, we only need to consider the words that can be read from state p. 
More precisely, when considering reduced words, we have 


Pin,£) > 7o(p)^ ^ l{p,uf. 


We observe that J^u&Tie is the p-component of and by Lemma 4.6 

it is greater than or equal to where (3 is the minimal component of Umin (in 

the notation of Lemma 4.6). This completes the proof of the statement concerning 
P{n,£). 

We now consider cyclically reduced words, under the hypothesis that A is non¬ 
degenerate and ergodic. Let t be such that all the coefficients of M* are positive, 
let Pinin be the least coefficient of this matrix, and let Pmin be the least positive 


coefficient of M. Finally, let p = liminf IR„(C), which is positive by Proposition 4.14 


Let X (resp. Xp) be the set of pairs of cyclically reduced words of length n that 
have a common prefix of length £ (resp. which can be read from state p). We note 
that 


P'{n,£) = 


.{X) 






’-n{Xp), 


SO we only need to find a lower bound for K„(Xp). 

Suppose that n > 1. Then Xp contains the set of pairs of reduced words 

of the form (uuiu'iu, uu 2 U 2 a) which can be read from p, where a is the first letter 
of u, and u'.^ and words of length t such that p • {uuiu[) = p • {UU 2 U 2 ) = P- 

Since these words start and end with the same letters, they are guaranteed to be 
cyclically reduced. Thus we have 


^niXp) > 7o(p)^ ^ 7(p,m)^ p^i„ p^i„ > /3 7o(p)^ p^i„ p^in a[2], 

uG'R-e. 


and this concludes the proof. 


□ 
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Lemma 4.25. Let A be an irreducible Markovian automaton with coincidence 
probability ap]- There exists a constant K > 0 such that the probability that three 
reduced words share the same prefix of length I is at most . 

If A is non-degenerate and ergodic, the same holds for triples of cyclically 
reduced words. 


Proof. The probability p{u) that three reduced words have a common prefix 

u is 

p{u) = ^ loipi) lo{P2)'loips)j{p2,u)'y{p3,u). 

P1,P2,P3£Q 

The probability we are interested in is obtained by summing over all u S TZi- It is 
bounded above by 

E E -fipi,u) j{p 2 ,u) 'y{p3,u). 

Pi,P 2,P3&Q uGTie 

By the Holder and Cauchy-Schwarz inequalities, we have 


E 7(P2,u) j{p3,u) 

uCi'R.£ 


< 


< 


E 'y(,pi,uf 


Ku^'TZe 


E 7(pi,m)^ 


E 't{P2,uf^ 'y{P3,u)i 


Ku^'TZe 


2 

3 


E 7(P2,w)^ 


KuGTZe, 



1 

3 


Moreover, we have 

E E 7(pi,m)^ = 1* M^3] 1. 

P^Q ugTZi 

We now get the announced result using Lemma |4.6[ Lemma |4.8| and the spectral 
properties of M^3]. The generalisation to cyclically reduced words follows from 
Lemma 13.131 □ 


We now build on the previous lemmas to show that, exponentially generically, 
large tuples of cyclically reduced words contain pairs of words with a common prefix 
of a prescribed length. 

Proposition 4.26. Let A be an irreducible Markovian automaton with coinci¬ 
dence probability ap]- Let {£n)n be an unbounded, monotonous sequence of positive 
integers such that £„ < n for each n, and let d > |. Then an -tuple of reduced 
words of length n drawn randomly according to A generically contains two words 
with the same prefix of length . 

If A is non-degenerate and ergodic, the same holds for -tuples of cyclically 
reduced words. 


Proof. We use the so-called second moment method, as developed in |32j . 
and we introduce the following notation to this end. Since the results of |32] are 
established for right-infinite words, we need to considered such words first; the result 
on words of length n directly follows by truncation. A right-infinite reduced word is 
an element u of such that for every i G N, ^i+i- We define the probability 
distribution Kco on right-infinite words induced by the Markovian automaton A by 




GENERIC PROPERTIES OF SUBGROUPS AND PRESENTATIONS 


37 


first setting Koo('Poo(m)) = 7(u), where Voa{u) is the set of right-infinite reduced 
words w such that the finite reduced word u is a prefix of w. The probability is 
then extended to the cr-algebra generated by the Vooiu), when u ranges over all 
finite reduced words (see |34) for more details on this kind of constructions). Let 
N = and consider an A^-tuple h = (/ii)i<i<Ar of right-infinite reduced words, 

independently and randomly generated according to A. 

For 1 < i < j < iV, let Xi j be the random variable computing the length 
of the longest common prefix of hi and hj. We want to show that, exponentially 
generically, 

max Xi , > in- 
l<i<j<N 

Let us relabel the random variables Xi j {i ^ j) as Yi,..., 1^, with m = (^) and, 
say, Yi = Xi^ 2 - We are therefore computing the maximum of m random variables, 
which are identically distributed but not independent. Fortunately, they behave 
almost as if they were independent, as we will see. 

Let d' be such that ^ < d' < d and for each m > 1, let 


rm = log - 2 d' (m) 
“[ 2 ] 


log(^) 


log a 


-2d' 

[ 2 ] 


logoj-^r' 


din 
d' ■ 


In particular, is asymptotically greater than in, and we only need to show that 


(7) lim Koo max Ffc > = 1. 

n—ica yfcG[m] / 

Let i^irm) denote the quantity 

f IRoo(Ll > Tm, Yk > Tni) 

We use Lemma 3 in |32j . which states that the desired equation Q holds if 
lim mKoo(Yi > Vm) = -foo and lim v{rm) = 1. 

n—>-oo n—¥oo 

We now proceed with the proof of these two equalities. By Lemma [4.24[ we have 
Koo(Yi >rm) > K Then 

log (toMoo (Yi > )) > log m + log K + rm log ap] 

= Tm l 0 g(apj^‘^ )+log K + rni log a[ 2 ] 

= rm log(a| 2 y^'^') + log K, 

which tends to -foo, since 1 — 2d' < 0 and ap] < 1. Therefore, 

lim mMoo(Li > rm) = +oo. 


Let us now consider v{rm)- Note that, if the Yi were independent random 
variables, we would have v{rm) = which tends to 1 when n tends to oo. 

Observe that if 2 < i j < iV, then X 12 and Xi j are independant and 
identically distributed, so 

Roo(7fl.2 > rm,X.,^j > rm) = Koo(^1,2 > = ^ 00(^1 > r„)^. 
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Also, since hi and /i 2 are drawn independently, we have Moo(Ali _2 > TrmXi^k > 
rm) = Koo(A1i,2 > fm,X 2 ^k > ?'m) for each k > 3. Therefore 


Arm) = 2 ^ ■ 

fe=3 


mMoo(yL > Tm)'^ 


(N-2\ 1 

2 y to' 


Since to = (^), we have lim„('^y^)fo = 1. Moreover, the joint probability 
1^00 (All ,2 > rmjXij. > Tra) is exactly the probability that three random reduced 
words share a common prefix of length r^: by Lemma |4.25 this is at most equal 


to K for some constant AT > 0. Together with Lemma 


4.24 


this yields 


N . 

k —3 


(-Ai,2 > ^ m ; Ali,fc > 
TO Moo (by > ?'m)^ 



for some K' > 0. In |16j it is proved that is a decreasing sequence, so we 

have and hence 


Therefore 



log 




= — log A1 


~Y 


log 0 ( 2 ] < 


— - log TO + K" 


'^m 


log a [ 2 ] 


for some constant K”. By definition of r^, we have log to = —2d'rm logQ:[ 2 ] and it 
follows that 

This quantity tends to —oo when n tends to oo since 2d! — 1 > 0 and apj < 1. 
This proves finally that limm_>oo p(?’m) = 1 and establishes Equation Q. That is, 
the desired statement is proved for tuples of infinite reduced words. As In < ri, 
considering right-infinite words and truncating then at their prefix of length n 
yields the same result. By construction, the probability distribution induced on 
this truncated words is exactly M„, concluding the proof. 

The generalisation to cyclically reduced words follows from Lemma |3.13[ □ 


We now use Proposition |4.26| to prove Proposition |4.22| 

Let 0 < A < !. Proposition 


Proof of Proposition 


4.22 


shows that, if ^ < d < 1, then a random 


4.26 


applied to £n = An 


2 ^ M, ^ ^ -tuple h of cyclically reduced words 

of length n, generically has two components hi and hj with the same prefix of 
length An, which is sufficient to show that h does not satisfy Property C'{\). □ 


We now translate the result of Proposition |4.26| into a result on the group pre¬ 
sented by a random apj^"-tuple, when d > \. We will use repeatedly Chernoff 
bounds uni Th. 4.2 p.70], which state that, in a binomial distribution with param¬ 
eters {k,p) — that is: X^ is the sum of k independent draws of 0 or 1 and p is the 
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probability of drawing 1 —, 


P (^k < < exp ^ 

In other words, 

( 8 ) P > Y^ > 1 - exp 




If h is a vector of cyclically reduced words, G is the group presented by G = 
{A I h) and u, v are reduced words, we write that u =g v li u and v have the same 
projection in G (that is: if uv~^ lies in the normal closure of h). 


Proposition 4.27. Let A he an ergodic Markovian automaton with coincidence 
probability ap] and let a,b & ^ be labels of transitions in A. Let d > ^ and 
N > and let G be a group presented by an N-tuple of cyclically reduced 

words chosen at random according to A. Then a =g b exponentially generically. 

Proof. Let t > 0 be such that all the coefficients of M* are positive (such an 
integer exists since M is ergodic) and let r > 0 be the minimum coefficient of M*. 

We proceed in two steps. First we consider transitions starting in the same 
state of the Markovian automaton and second we generalize the study to transitions 
beginning in different states of the automaton. 

First step of the proof. We show that if x = xi ■ ■ ■ Xg and y = yi • ■ ■ Vs are reduced 
words of equal length s > 1 which label paths in A out of the same state q, then 
exponentially generically, we have x^ =g TJk for each 1 < fc < s. 

Recall that, in our model of Markovian automata, drawing a word of length 
n amounts to drawing a state r G Q according to 70 , and then drawing a word of 
length n according to 7 (r, —). Thus, when drawing a tuple h = (hi)i, we also draw 
a tuple q = (qi)i of states such that, in particular, 7 o(i?i) > 0 and 7 ( 9 ^, hi) > 0 . 

Let r be a state such that 70 (r) > 0. Let Tq = {hi G h such that qi = r} and 
Nq = |To|. Observe that drawing randomly and independently N words of length 
n in our model and then keeping only those starting in state r to obtain Tg is the 
same as first choosing Nq according to a binomial law of parameters (70 (t), N) and 
then drawing randomly and independently Nq words beginning in state r. Moreover 
Chernoff bounds (Equation ^ above, applied with p = 70 (r) and k = N) show 
that P (^Nq > j > pg with pg = 1 — exp ^ 

For each s > 1, we say that a pair of indices {i,j) is an s-collision in Tq if 
hi and hj belong to Tq and have the same prefix of length n — t — s. Let e be 
such that 0 < e < d — I and let N' = Then a random A^g-tuple of 

cyclically reduced words starting in r is obtained by drawing ^ times a random 
fV'-tuple starting in state r. Moreover choosing a random word in a Markovian 
automaton given that the associated path begins in state r is the same as taking 
for initial probability vector 7 g the probability vector such that 7 g(r) = 1. Since 
the conclusion of Propositio n |4. 26| does not depend on the initial probability vector 
and d — e > Proposition 


4.26 


applied to = n — t — s shows that a random 
iV'-tuple of cyclically reduced words that starts in r generically exhibits at least 
one s-collision in Tq. 

We assume that n is large enough so that the probability of an s-collision in Tq 
of a random iV'-tuple is at least f. Then Chernoff bounds (Equation ®, applied 
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with p = \ and k = Nq) show that the set Ti of s-collisions in Tq of a random 
tuple of cyclically reduced words of length n satisfies [Til > with probability 
greater than or equal to pi = 1 — exp(—^). 

For each s-collision (z, j) G Fi, we let u{i,j) be the common length n — t — s 
prehx of hi and hj. Then by a finiteness argument, there exists a state qi £ Q and 
a set T 2 C Ti such that, for every (z, j) G T 2 , u{i,j) labels a path from r to qi in 
A, and \T 2 \ > Hence 1121 > with probability greater than or equal to pi. 

Now let u be a reduced word of length t, labeling a path in A from qi to 
q: such a word exists since all the coefficients of M‘ are positive, and we have 

> T. For each (z,j) G T 2 , the probability that hi starts with u{i,j)v is 

> r, and the probability that uv is a prehx of both hi and hj is at least 
T^. We can apply Chernoff bounds Q again, with p = and k = \T 2 \: then the 
subset T 3 C T 2 of pairs (z,j) such that u{i,j)v is a prehx of both hi and hj, has 
cardinality [Taj > 5 |T 2 |t^ with probability at least p 2 = 1 — exp(— 

Finally, we note that |u(z,j)z;| = n — s, so for each (z,j) G T 3 , we have 
hi = u{i,j)vx with probability 'y(q,x). Therefore the probability that {hi,hj) = 
{u{i, j)vx,u(i, j)vy) is j{q,x)^{q,y), which is positive by hypothesis. Applying 
Chernoff bounds one more time (with k = IT 3 I and p = ^{q, x)"f{q,y)) shows that 
h contains a pair of words of the form [wx,wy) with probability at least pa with 

Pa = 

In conclusion, exponentially generically Nq > 7 o(z’)'y which implies that pi 
is exponentially close to 1. Hence T 2 > exponentially generically, which 

implies that p 2 is exponentially close to 1. So [Taj > exponentially generi¬ 

cally, which implies that pa is exponentially close to 1. In particular, exponentially 
generically, h has a pair of the form {wx, wy), and hence we have x =c y- 

Applying this to the words Xi and z/i, we find that Xi =c yi- Next, considering 
the words a:ia ;2 and j/iz/ 2 , we hnd that a;iX 2 =g ?/i 2 / 2 , and hence X 2 =g 2 / 2 - Iterating 
this reasoning, we finally show that Xk =g Uk for each 1 < fc < s. 

Second step of the proof We now consider two transitions in A, one labeled a from 
state q to state q' and another labeled b from state r to state r' {a,b G A). 

Let go G Q be a state in A such that 7 o(go) > 0- Since A is irreducible, 
there exists a word Wi which labels a loop at go and visits every transition of A. 
Moreover, since A is ergodic, there exists a word W 2 labeling another loop at go, 
such that Izcil and |z(; 2 | are relatively prime. 

Since reading wi from go visit all the transitions, let ui (resp. zzi) be a prefix of 
wi such that the last transition read after reading ui (resp. zzi) is the a-transition 
out of state g (resp. the 6 -transition out of state r). Then the Chinese remainder 
theorem shows that there exist words x G {wi,W 2 }*ui and y G {wi,W 2 }*vi of equal 
length. 

Since a and 6 are the last letters of x and y, respectively, the first step of the 
proof shows that a =g 6 , which concludes the proof of the proposition. □ 


We can now complete the proof of Proposition 4.23 , By Proposition 4.27 
exponentially generically, all the letters in E are equal in G. li a, a~^ G E for some 
letter a, then all these letters are equal to their own inverse in G, so the subgroup H 
of G generated by G is a quotient of Z/2Z. Since all the relators in the presentation 
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have length n, it follows that H is isomorphic to Z/2Z if n is even, and is trivial if 
n is odd. The result follows once we observe that the letters in D do not occur in 
any relator. □ 
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